How to Shuffle a DataFrame in pandas: Simple Guide
To shuffle rows in a pandas DataFrame, use the
sample() method with frac=1 to return all rows in random order. For example, df.sample(frac=1) returns a shuffled DataFrame.Syntax
The main method to shuffle a pandas DataFrame is sample(). It has these key parts:
frac=1: means return 100% of rows in random order.random_state: sets a seed number to get the same shuffle every time (optional).replace=False: ensures rows are not repeated (default behavior).
python
df.sample(frac=1, random_state=None, replace=False)
Example
This example shows how to shuffle all rows of a DataFrame and reset the index for a clean result.
python
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'], 'Age': [25, 30, 35, 40, 45]} df = pd.DataFrame(data) # Shuffle all rows shuffled_df = df.sample(frac=1, random_state=42).reset_index(drop=True) print(shuffled_df)
Output
Name Age
0 David 40
1 Eva 45
2 Alice 25
3 Charlie 35
4 Bob 30
Common Pitfalls
Some common mistakes when shuffling a DataFrame:
- Not using
frac=1will return only a fraction of rows, not a full shuffle. - Forgetting to reset the index after shuffling keeps the old row numbers, which can be confusing.
- Not setting
random_stateif you want reproducible shuffles.
python
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]} df = pd.DataFrame(data) # Wrong: returns only 50% rows shuffled partial_shuffle = df.sample(frac=0.5) print(partial_shuffle) # Right: shuffle all rows full_shuffle = df.sample(frac=1).reset_index(drop=True) print(full_shuffle)
Output
Name Age
1 Bob 30
Name Age
0 Charlie 35
1 Alice 25
2 Bob 30
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| frac | Fraction of rows to return in random order | 1 (all rows) |
| random_state | Seed for reproducible shuffling | None |
| replace | Allow sampling with replacement | False |
| reset_index(drop=True) | Reset index after shuffle to clean row numbers | Must call explicitly |
Key Takeaways
Use df.sample(frac=1) to shuffle all rows of a DataFrame.
Set random_state for reproducible shuffles.
Call reset_index(drop=True) after shuffling to reset row numbers.
Avoid using frac less than 1 if you want to shuffle the entire DataFrame.
Sampling with replace=True can duplicate rows, usually not desired for shuffling.