0
0
PandasHow-ToBeginner · 3 min read

How to Shuffle a DataFrame in pandas: Simple Guide

To shuffle rows in a pandas DataFrame, use the sample() method with frac=1 to return all rows in random order. For example, df.sample(frac=1) returns a shuffled DataFrame.
📐

Syntax

The main method to shuffle a pandas DataFrame is sample(). It has these key parts:

  • frac=1: means return 100% of rows in random order.
  • random_state: sets a seed number to get the same shuffle every time (optional).
  • replace=False: ensures rows are not repeated (default behavior).
python
df.sample(frac=1, random_state=None, replace=False)
💻

Example

This example shows how to shuffle all rows of a DataFrame and reset the index for a clean result.

python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)

# Shuffle all rows
shuffled_df = df.sample(frac=1, random_state=42).reset_index(drop=True)
print(shuffled_df)
Output
Name Age 0 David 40 1 Eva 45 2 Alice 25 3 Charlie 35 4 Bob 30
⚠️

Common Pitfalls

Some common mistakes when shuffling a DataFrame:

  • Not using frac=1 will return only a fraction of rows, not a full shuffle.
  • Forgetting to reset the index after shuffling keeps the old row numbers, which can be confusing.
  • Not setting random_state if you want reproducible shuffles.
python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Wrong: returns only 50% rows shuffled
partial_shuffle = df.sample(frac=0.5)
print(partial_shuffle)

# Right: shuffle all rows
full_shuffle = df.sample(frac=1).reset_index(drop=True)
print(full_shuffle)
Output
Name Age 1 Bob 30 Name Age 0 Charlie 35 1 Alice 25 2 Bob 30
📊

Quick Reference

ParameterDescriptionDefault
fracFraction of rows to return in random order1 (all rows)
random_stateSeed for reproducible shufflingNone
replaceAllow sampling with replacementFalse
reset_index(drop=True)Reset index after shuffle to clean row numbersMust call explicitly

Key Takeaways

Use df.sample(frac=1) to shuffle all rows of a DataFrame.
Set random_state for reproducible shuffles.
Call reset_index(drop=True) after shuffling to reset row numbers.
Avoid using frac less than 1 if you want to shuffle the entire DataFrame.
Sampling with replace=True can duplicate rows, usually not desired for shuffling.