How to Use var in pandas for Variance Calculation
In pandas, you use the
var() method on a DataFrame or Series to calculate the variance of numeric data. This method computes the sample variance by default, which measures how spread out the data is around the mean.Syntax
The var() method can be called on a pandas DataFrame or Series. It has optional parameters to control the calculation.
axis: Choose whether to calculate variance across rows (axis=1) or columns (axis=0, default).ddof: Delta degrees of freedom. Default is 1 for sample variance.numeric_only: If True, only include numeric data.
python
DataFrame.var(axis=0, ddof=1, numeric_only=None, **kwargs) Series.var(ddof=1, **kwargs)
Example
This example shows how to calculate variance for each numeric column in a DataFrame using var(). It also shows variance for a single Series.
python
import pandas as pd data = { 'math': [90, 80, 70, 60, 85], 'english': [88, 92, 85, 87, 90], 'history': [75, 78, 80, 72, 70] } df = pd.DataFrame(data) # Variance of each column variance_df = df.var() # Variance of math scores only variance_math = df['math'].var() print('Variance of each subject:') print(variance_df) print('\nVariance of math scores:') print(variance_math)
Output
Variance of each subject:
math 112.5
english 7.7
history 13.5
dtype: float64
Variance of math scores:
112.5
Common Pitfalls
Common mistakes when using var() include:
- Forgetting that
var()calculates sample variance by default (ddof=1), not population variance. - Applying
var()on non-numeric columns without settingnumeric_only=True, which can cause errors. - Misunderstanding the
axisparameter, which changes the direction of calculation.
python
import pandas as pd data = {'A': [1, 2, 3], 'B': ['x', 'y', 'z']} df = pd.DataFrame(data) # This will raise an error because column B is non-numeric # df.var() # Correct way: ignore non-numeric columns variance = df.var(numeric_only=True) print(variance)
Output
A 1.0
dtype: float64
Quick Reference
Summary tips for using var() in pandas:
- Use
df.var()to get variance of each numeric column. - Use
series.var()for variance of a single column. - Set
ddof=0for population variance. - Use
numeric_only=Trueto avoid errors with non-numeric data. - Use
axis=1to calculate variance across rows instead of columns.
Key Takeaways
Use pandas
var() method to calculate variance of numeric data in DataFrames or Series.By default,
var() calculates sample variance with ddof=1.Set
numeric_only=True to avoid errors from non-numeric columns.Use
axis parameter to control calculation direction (columns or rows).Set
ddof=0 to calculate population variance if needed.