PandasHow-ToBeginner · 3 min read

How to Sample Data from DataFrame in pandas: Simple Guide

Use the sample() method on a pandas DataFrame to randomly select rows. You can specify the number of rows with n or the fraction of rows with frac. This method helps to get a random subset of your data easily.

📐

Syntax

The basic syntax of the sample() method is:

df.sample(n=None, frac=None, replace=False, random_state=None)

Where:

n: Number of rows to return (integer).
frac: Fraction of rows to return (float between 0 and 1).
replace: Whether to sample with replacement (True or False).
random_state: Seed for reproducibility (integer or None).

python

df.sample(n=5, frac=None, replace=False, random_state=None)

💻

Example

This example shows how to sample 3 random rows from a DataFrame and how to sample 50% of the rows.

python

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'Chicago', 'Houston', 'Phoenix']}

df = pd.DataFrame(data)

# Sample 3 random rows
sample_n = df.sample(n=3, random_state=1)

# Sample 50% of rows
sample_frac = df.sample(frac=0.5, random_state=1)

print('Sample 3 rows:')
print(sample_n)
print('\nSample 50% rows:')
print(sample_frac)

Output

Sample 3 rows: Name Age City 2 Charlie 35 Chicago 0 Alice 25 NY 3 David 40 Houston Sample 50% rows: Name Age City 2 Charlie 35 Chicago 0 Alice 25 NY

⚠️

Common Pitfalls

Common mistakes when sampling data include:

Using both n and frac at the same time, which causes an error.
Not setting random_state when you want reproducible results.
Sampling more rows than exist without replace=True, which causes an error.

python

import pandas as pd

df = pd.DataFrame({'A': range(5)})

# Wrong: using both n and frac
# df.sample(n=2, frac=0.5)  # This will raise ValueError

# Correct: use only one
sample_correct = df.sample(n=2, random_state=42)

# Wrong: sampling more rows than exist without replacement
# df.sample(n=10)  # Raises ValueError

# Correct: use replace=True to allow duplicates
sample_replace = df.sample(n=10, replace=True, random_state=42)

print('Sample with n=2:')
print(sample_correct)
print('\nSample with replacement (n=10):')
print(sample_replace)

Output

Sample with n=2: A 1 1 4 4 Sample with replacement (n=10): A 1 1 4 4 1 1 1 1 2 2 4 4 1 1 2 2 4 4 2 2

📊

Quick Reference

Parameter	Description	Example
n	Number of rows to sample	df.sample(n=5)
frac	Fraction of rows to sample	df.sample(frac=0.3)
replace	Sample with replacement	df.sample(n=10, replace=True)
random_state	Seed for reproducibility	df.sample(n=3, random_state=42)

✅

Key Takeaways

Use df.sample() to randomly select rows from a DataFrame.

Specify either n (number) or frac (fraction) but not both.

Set random_state for reproducible sampling results.

Use replace=True to sample with replacement when needed.

Sampling helps create smaller, random subsets for analysis or testing.