How to calculate z-score pandas

PandasHow-ToBeginner · 3 min read

How to Calculate Z-Score in Pandas: Simple Guide

To calculate the z-score in pandas, subtract the mean of the column from each value and then divide by the column's standard deviation. You can do this easily with df['column'] = (df['column'] - df['column'].mean()) / df['column'].std().

📐

Syntax

The formula to calculate the z-score for a pandas DataFrame column is:

df['column']: The data column you want to transform.
df['column'].mean(): The average value of the column.
df['column'].std(): The standard deviation of the column.
The expression (df['column'] - df['column'].mean()) / df['column'].std() computes the z-score for each value.

python

df['z_score'] = (df['column'] - df['column'].mean()) / df['column'].std()

💻

Example

This example shows how to calculate the z-score for a numeric column in a pandas DataFrame. It creates a simple DataFrame, calculates the z-score, and adds it as a new column.

python

import pandas as pd

data = {'score': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

df['z_score'] = (df['score'] - df['score'].mean()) / df['score'].std()
print(df)

Output

score z_score 0 10 -1.264911 1 20 -0.632456 2 30 0.000000 3 40 0.632456 4 50 1.264911

⚠️

Common Pitfalls

Common mistakes when calculating z-scores in pandas include:

Forgetting to use parentheses properly, which can cause wrong order of operations.
Calculating mean and std on the wrong axis or on the entire DataFrame instead of a single column.
Not handling missing values, which can cause NaN results.

Always check your data and use column-wise operations.

python

import pandas as pd

data = {'score': [10, 20, None, 40, 50]}
df = pd.DataFrame(data)

# Wrong: mean and std on entire DataFrame (will cause error or wrong result)
# df['z_score'] = (df['score'] - df.mean()) / df.std()

# Right: calculate on the column and handle missing values
mean = df['score'].mean()
std = df['score'].std()
df['z_score'] = (df['score'] - mean) / std
print(df)

Output

score z_score 0 10.0 -1.264911 1 20.0 -0.632456 2 NaN NaN 3 40.0 0.632456 4 50.0 1.264911

📊

Quick Reference

Remember these tips when calculating z-scores in pandas:

Use df['column'].mean() and df['column'].std() for column-wise calculations.
Subtract the mean before dividing by the standard deviation.
Handle missing values to avoid NaN in results.
Z-score standardizes data to have mean 0 and std 1.

✅

Key Takeaways

Calculate z-score by subtracting the column mean and dividing by the column standard deviation.

Always perform calculations on individual columns, not the whole DataFrame.

Handle missing values to prevent errors or NaNs in your z-score results.

Z-score standardizes data, making it easier to compare values across different scales.