How to Select Random Rows in pandas DataFrame
Use the
DataFrame.sample() method to select random rows in pandas. Specify the number of rows with n or the fraction of rows with frac. For example, df.sample(n=3) returns 3 random rows.Syntax
The sample() method selects random rows from a pandas DataFrame.
n: Number of rows to return (integer).frac: Fraction of rows to return (float between 0 and 1).replace: Whether to sample with replacement (True or False).random_state: Seed for reproducible results (integer).
python
df.sample(n=number_of_rows, frac=fraction_of_rows, replace=False, random_state=None)
Example
This example shows how to select 3 random rows from a DataFrame and how to select 50% of the rows randomly.
python
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'], 'Age': [25, 30, 35, 40, 45]} df = pd.DataFrame(data) # Select 3 random rows random_rows = df.sample(n=3, random_state=42) # Select 50% of rows randomly random_fraction = df.sample(frac=0.5, random_state=42) print('Random 3 rows:\n', random_rows) print('\nRandom 50% rows:\n', random_fraction)
Output
Random 3 rows:
Name Age
1 Bob 30
4 Eva 45
2 Charlie 35
Random 50% rows:
Name Age
1 Bob 30
4 Eva 45
Common Pitfalls
Common mistakes when selecting random rows include:
- Using
nlarger than the DataFrame size withoutreplace=True, which causes an error. - Not setting
random_statewhen reproducibility is needed, leading to different results each run. - Confusing
nandfracparameters by using both at the same time (only one should be used).
python
import pandas as pd data = {'A': [1, 2, 3]} df = pd.DataFrame(data) # Wrong: n larger than DataFrame size without replace # df.sample(n=5) # This will raise a ValueError # Correct: use replace=True to allow repeats sample_with_replace = df.sample(n=5, replace=True, random_state=1) print(sample_with_replace)
Output
A
1 2
0 1
2 3
1 2
1 2
Quick Reference
| Parameter | Description | Example |
|---|---|---|
| n | Number of rows to sample | df.sample(n=3) |
| frac | Fraction of rows to sample | df.sample(frac=0.5) |
| replace | Sample with replacement | df.sample(n=5, replace=True) |
| random_state | Seed for reproducibility | df.sample(n=3, random_state=42) |
Key Takeaways
Use df.sample() to select random rows from a DataFrame.
Specify either n (number) or frac (fraction) but not both.
Set random_state for reproducible random samples.
Use replace=True if sampling more rows than exist.
Avoid common errors by checking DataFrame size before sampling.