How to select random rows pandas

PandasHow-ToBeginner · 3 min read

How to Select Random Rows in pandas DataFrame

Use the DataFrame.sample() method to select random rows in pandas. Specify the number of rows with n or the fraction of rows with frac. For example, df.sample(n=3) returns 3 random rows.

📐

Syntax

The sample() method selects random rows from a pandas DataFrame.

n: Number of rows to return (integer).
frac: Fraction of rows to return (float between 0 and 1).
replace: Whether to sample with replacement (True or False).
random_state: Seed for reproducible results (integer).

python

df.sample(n=number_of_rows, frac=fraction_of_rows, replace=False, random_state=None)

💻

Example

This example shows how to select 3 random rows from a DataFrame and how to select 50% of the rows randomly.

python

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)

# Select 3 random rows
random_rows = df.sample(n=3, random_state=42)

# Select 50% of rows randomly
random_fraction = df.sample(frac=0.5, random_state=42)

print('Random 3 rows:\n', random_rows)
print('\nRandom 50% rows:\n', random_fraction)

Output

Random 3 rows: Name Age 1 Bob 30 4 Eva 45 2 Charlie 35 Random 50% rows: Name Age 1 Bob 30 4 Eva 45

⚠️

Common Pitfalls

Common mistakes when selecting random rows include:

Using n larger than the DataFrame size without replace=True, which causes an error.
Not setting random_state when reproducibility is needed, leading to different results each run.
Confusing n and frac parameters by using both at the same time (only one should be used).

python

import pandas as pd

data = {'A': [1, 2, 3]}
df = pd.DataFrame(data)

# Wrong: n larger than DataFrame size without replace
# df.sample(n=5)  # This will raise a ValueError

# Correct: use replace=True to allow repeats
sample_with_replace = df.sample(n=5, replace=True, random_state=1)
print(sample_with_replace)

Output

A 1 2 0 1 2 3 1 2 1 2

📊

Quick Reference

Parameter	Description	Example
n	Number of rows to sample	df.sample(n=3)
frac	Fraction of rows to sample	df.sample(frac=0.5)
replace	Sample with replacement	df.sample(n=5, replace=True)
random_state	Seed for reproducibility	df.sample(n=3, random_state=42)

✅

Key Takeaways

Use df.sample() to select random rows from a DataFrame.

Specify either n (number) or frac (fraction) but not both.

Set random_state for reproducible random samples.

Use replace=True if sampling more rows than exist.

Avoid common errors by checking DataFrame size before sampling.