How to Calculate Z-Score in Pandas: Simple Guide
To calculate the
z-score in pandas, subtract the mean of the column from each value and then divide by the column's standard deviation. You can do this easily with df['column'] = (df['column'] - df['column'].mean()) / df['column'].std().Syntax
The formula to calculate the z-score for a pandas DataFrame column is:
df['column']: The data column you want to transform.df['column'].mean(): The average value of the column.df['column'].std(): The standard deviation of the column.- The expression
(df['column'] - df['column'].mean()) / df['column'].std()computes the z-score for each value.
python
df['z_score'] = (df['column'] - df['column'].mean()) / df['column'].std()
Example
This example shows how to calculate the z-score for a numeric column in a pandas DataFrame. It creates a simple DataFrame, calculates the z-score, and adds it as a new column.
python
import pandas as pd data = {'score': [10, 20, 30, 40, 50]} df = pd.DataFrame(data) df['z_score'] = (df['score'] - df['score'].mean()) / df['score'].std() print(df)
Output
score z_score
0 10 -1.264911
1 20 -0.632456
2 30 0.000000
3 40 0.632456
4 50 1.264911
Common Pitfalls
Common mistakes when calculating z-scores in pandas include:
- Forgetting to use parentheses properly, which can cause wrong order of operations.
- Calculating mean and std on the wrong axis or on the entire DataFrame instead of a single column.
- Not handling missing values, which can cause
NaNresults.
Always check your data and use column-wise operations.
python
import pandas as pd data = {'score': [10, 20, None, 40, 50]} df = pd.DataFrame(data) # Wrong: mean and std on entire DataFrame (will cause error or wrong result) # df['z_score'] = (df['score'] - df.mean()) / df.std() # Right: calculate on the column and handle missing values mean = df['score'].mean() std = df['score'].std() df['z_score'] = (df['score'] - mean) / std print(df)
Output
score z_score
0 10.0 -1.264911
1 20.0 -0.632456
2 NaN NaN
3 40.0 0.632456
4 50.0 1.264911
Quick Reference
Remember these tips when calculating z-scores in pandas:
- Use
df['column'].mean()anddf['column'].std()for column-wise calculations. - Subtract the mean before dividing by the standard deviation.
- Handle missing values to avoid
NaNin results. - Z-score standardizes data to have mean 0 and std 1.
Key Takeaways
Calculate z-score by subtracting the column mean and dividing by the column standard deviation.
Always perform calculations on individual columns, not the whole DataFrame.
Handle missing values to prevent errors or NaNs in your z-score results.
Z-score standardizes data, making it easier to compare values across different scales.