0
0
PandasHow-ToBeginner · 3 min read

How to Select Random Rows in pandas DataFrame

Use the DataFrame.sample() method to select random rows in pandas. Specify the number of rows with n or the fraction of rows with frac. For example, df.sample(n=3) returns 3 random rows.
📐

Syntax

The sample() method selects random rows from a pandas DataFrame.

  • n: Number of rows to return (integer).
  • frac: Fraction of rows to return (float between 0 and 1).
  • replace: Whether to sample with replacement (True or False).
  • random_state: Seed for reproducible results (integer).
python
df.sample(n=number_of_rows, frac=fraction_of_rows, replace=False, random_state=None)
💻

Example

This example shows how to select 3 random rows from a DataFrame and how to select 50% of the rows randomly.

python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)

# Select 3 random rows
random_rows = df.sample(n=3, random_state=42)

# Select 50% of rows randomly
random_fraction = df.sample(frac=0.5, random_state=42)

print('Random 3 rows:\n', random_rows)
print('\nRandom 50% rows:\n', random_fraction)
Output
Random 3 rows: Name Age 1 Bob 30 4 Eva 45 2 Charlie 35 Random 50% rows: Name Age 1 Bob 30 4 Eva 45
⚠️

Common Pitfalls

Common mistakes when selecting random rows include:

  • Using n larger than the DataFrame size without replace=True, which causes an error.
  • Not setting random_state when reproducibility is needed, leading to different results each run.
  • Confusing n and frac parameters by using both at the same time (only one should be used).
python
import pandas as pd

data = {'A': [1, 2, 3]}
df = pd.DataFrame(data)

# Wrong: n larger than DataFrame size without replace
# df.sample(n=5)  # This will raise a ValueError

# Correct: use replace=True to allow repeats
sample_with_replace = df.sample(n=5, replace=True, random_state=1)
print(sample_with_replace)
Output
A 1 2 0 1 2 3 1 2 1 2
📊

Quick Reference

ParameterDescriptionExample
nNumber of rows to sampledf.sample(n=3)
fracFraction of rows to sampledf.sample(frac=0.5)
replaceSample with replacementdf.sample(n=5, replace=True)
random_stateSeed for reproducibilitydf.sample(n=3, random_state=42)

Key Takeaways

Use df.sample() to select random rows from a DataFrame.
Specify either n (number) or frac (fraction) but not both.
Set random_state for reproducible random samples.
Use replace=True if sampling more rows than exist.
Avoid common errors by checking DataFrame size before sampling.