How to Calculate Variance Using NumPy in Python
Use
numpy.var() to calculate the variance of a dataset in NumPy. It measures how spread out the numbers are by default over the entire array, and you can specify the axis to calculate variance along rows or columns.Syntax
The basic syntax of numpy.var() is:
numpy.var(a, axis=None, ddof=0, keepdims=False)
Where:
a: input array of numbersaxis: axis along which to compute variance;Nonemeans flatten the arrayddof: Delta Degrees of Freedom; the divisor used isN - ddof, default is 0keepdims: if True, retains reduced dimensions with size 1
python
numpy.var(a, axis=None, ddof=0, keepdims=False)
Example
This example shows how to calculate variance of a 1D and 2D array using numpy.var(). It also demonstrates variance along different axes.
python
import numpy as np # 1D array arr1 = np.array([1, 2, 3, 4, 5]) var1 = np.var(arr1) # 2D array arr2 = np.array([[1, 2, 3], [4, 5, 6]]) var2_all = np.var(arr2) # variance of all elements var2_axis0 = np.var(arr2, axis=0) # variance along columns var2_axis1 = np.var(arr2, axis=1) # variance along rows print(f"Variance of arr1: {var1}") print(f"Variance of all elements in arr2: {var2_all}") print(f"Variance along columns in arr2: {var2_axis0}") print(f"Variance along rows in arr2: {var2_axis1}")
Output
Variance of arr1: 2.0
Variance of all elements in arr2: 2.9166666666666665
Variance along columns in arr2: [2.25 2.25 2.25]
Variance along rows in arr2: [0.66666667 0.66666667]
Common Pitfalls
Common mistakes when calculating variance with NumPy include:
- Not setting
ddof=1when you want the sample variance instead of population variance. By default,ddof=0calculates population variance. - Confusing the axis parameter, which changes the dimension along which variance is calculated.
- Forgetting that variance is sensitive to data scale and outliers.
python
import numpy as np arr = np.array([1, 2, 3, 4, 5]) # Wrong: default ddof=0 (population variance) pop_var = np.var(arr) # Right: ddof=1 for sample variance sample_var = np.var(arr, ddof=1) print(f"Population variance (ddof=0): {pop_var}") print(f"Sample variance (ddof=1): {sample_var}")
Output
Population variance (ddof=0): 2.0
Sample variance (ddof=1): 2.5
Quick Reference
Summary tips for using numpy.var():
- Use
ddof=0for population variance,ddof=1for sample variance. - Set
axisto calculate variance along rows (axis=1) or columns (axis=0) in 2D arrays. - Variance measures spread; higher values mean data points are more spread out.
Key Takeaways
Use numpy.var() to calculate variance of arrays easily.
Set ddof=1 to get sample variance instead of population variance.
Use the axis parameter to calculate variance along specific dimensions.
Variance shows how spread out your data is around the mean.
Remember variance is sensitive to outliers and data scale.