What is Sample() for random rows in Data Analysis Python?

Data Analysis Pythondata~5 mins

Sample() for random rows in Data Analysis Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

We use sample() to pick random rows from a table. This helps us look at a small, random part of big data.

You want to check a few random records from a large dataset to understand its structure.

You need to create a smaller dataset for testing or training a model.

You want to randomly select data points for a quick quality check.

You want to split data randomly for experiments or validation.

You want to shuffle data rows before analysis.

Syntax

Data Analysis Python

DataFrame.sample(n=None, frac=None, replace=False, random_state=None)

n is the number of rows to pick randomly.

frac is the fraction of rows to pick (like 0.1 for 10%).

Examples

Pick 3 random rows from the DataFrame df.

Data Analysis Python

df.sample(n=3)

Pick 20% random rows from df.

Data Analysis Python

df.sample(frac=0.2)

Pick 5 rows randomly with replacement, so rows can repeat.

Data Analysis Python

df.sample(n=5, replace=True)

Pick 4 random rows but always the same ones each time you run (for reproducibility).

Data Analysis Python

df.sample(n=4, random_state=42)

Sample Program

This code creates a small table of names and ages. Then it picks 2 random rows from it. Using random_state=1 makes sure the same rows are picked every time you run it.

Data Analysis Python

import pandas as pd

# Create a simple DataFrame
data = {'Name': ['Anna', 'Bob', 'Cara', 'Dan', 'Eva'],
        'Age': [23, 35, 45, 29, 41]}
df = pd.DataFrame(data)

# Pick 2 random rows
sampled_rows = df.sample(n=2, random_state=1)

print(sampled_rows)

OutputSuccess

Important Notes

If you use frac, do not use n at the same time.

Setting random_state helps get the same random rows every time, useful for sharing results.

By default, replace=False means rows won't repeat in the sample.

Summary

sample() helps pick random rows from data.

You can choose how many rows or what fraction to pick.

Use random_state to get repeatable random samples.