0
0
NumpyHow-ToBeginner ยท 3 min read

How to Calculate Variance Using NumPy in Python

Use numpy.var() to calculate the variance of a dataset in NumPy. It measures how spread out the numbers are by default over the entire array, and you can specify the axis to calculate variance along rows or columns.
๐Ÿ“

Syntax

The basic syntax of numpy.var() is:

  • numpy.var(a, axis=None, ddof=0, keepdims=False)

Where:

  • a: input array of numbers
  • axis: axis along which to compute variance; None means flatten the array
  • ddof: Delta Degrees of Freedom; the divisor used is N - ddof, default is 0
  • keepdims: if True, retains reduced dimensions with size 1
python
numpy.var(a, axis=None, ddof=0, keepdims=False)
๐Ÿ’ป

Example

This example shows how to calculate variance of a 1D and 2D array using numpy.var(). It also demonstrates variance along different axes.

python
import numpy as np

# 1D array
arr1 = np.array([1, 2, 3, 4, 5])
var1 = np.var(arr1)

# 2D array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
var2_all = np.var(arr2)  # variance of all elements
var2_axis0 = np.var(arr2, axis=0)  # variance along columns
var2_axis1 = np.var(arr2, axis=1)  # variance along rows

print(f"Variance of arr1: {var1}")
print(f"Variance of all elements in arr2: {var2_all}")
print(f"Variance along columns in arr2: {var2_axis0}")
print(f"Variance along rows in arr2: {var2_axis1}")
Output
Variance of arr1: 2.0 Variance of all elements in arr2: 2.9166666666666665 Variance along columns in arr2: [2.25 2.25 2.25] Variance along rows in arr2: [0.66666667 0.66666667]
โš ๏ธ

Common Pitfalls

Common mistakes when calculating variance with NumPy include:

  • Not setting ddof=1 when you want the sample variance instead of population variance. By default, ddof=0 calculates population variance.
  • Confusing the axis parameter, which changes the dimension along which variance is calculated.
  • Forgetting that variance is sensitive to data scale and outliers.
python
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Wrong: default ddof=0 (population variance)
pop_var = np.var(arr)

# Right: ddof=1 for sample variance
sample_var = np.var(arr, ddof=1)

print(f"Population variance (ddof=0): {pop_var}")
print(f"Sample variance (ddof=1): {sample_var}")
Output
Population variance (ddof=0): 2.0 Sample variance (ddof=1): 2.5
๐Ÿ“Š

Quick Reference

Summary tips for using numpy.var():

  • Use ddof=0 for population variance, ddof=1 for sample variance.
  • Set axis to calculate variance along rows (axis=1) or columns (axis=0) in 2D arrays.
  • Variance measures spread; higher values mean data points are more spread out.
โœ…

Key Takeaways

Use numpy.var() to calculate variance of arrays easily.
Set ddof=1 to get sample variance instead of population variance.
Use the axis parameter to calculate variance along specific dimensions.
Variance shows how spread out your data is around the mean.
Remember variance is sensitive to outliers and data scale.