How to calculate covariance numpy

NumpyHow-ToBeginner · 3 min read

How to Calculate Covariance Using NumPy in Python

You can calculate covariance in NumPy using the numpy.cov() function, which computes the covariance matrix of input data arrays. Pass your data arrays as arguments, and it returns a matrix showing covariance between variables.

📐

Syntax

The basic syntax of numpy.cov() is:

numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)

Here:

m: Input data array (1D or 2D).
y: Optional second data array to compute covariance with m.
rowvar: If True, each row represents a variable; if False, each column is a variable.
bias: If True, normalization is by N, otherwise by N-1 (default).
ddof: Delta degrees of freedom for normalization.
fweights and aweights: Frequency and amplitude weights.

python

numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)

💻

Example

This example shows how to calculate the covariance matrix between two variables using numpy.cov(). It demonstrates the covariance values between two data arrays.

python

import numpy as np

# Two data arrays
x = np.array([2.1, 2.5, 4.0, 3.6])
y = np.array([8, 10, 12, 14])

# Calculate covariance matrix
cov_matrix = np.cov(x, y)
print(cov_matrix)

Output

[[1.29 1.7 ] [1.7 4.67]]

⚠️

Common Pitfalls

Common mistakes when using numpy.cov() include:

Passing 1D arrays without setting rowvar=False when variables are in columns.
Misunderstanding the shape of the output covariance matrix.
Confusing covariance with correlation.

Example of a common mistake and fix:

python

# Wrong: variables as columns but rowvar=True (default)
import numpy as np
x = np.array([[2.1, 2.5, 4.0, 3.6],
              [8, 10, 12, 14]])
cov_wrong = np.cov(x)
print('Wrong covariance matrix:')
print(cov_wrong)

# Right: set rowvar=False when variables are columns
cov_right = np.cov(x, rowvar=False)
print('Correct covariance matrix:')
print(cov_right)

Output

Wrong covariance matrix: [[1.29 1.7 ] [1.7 4.67]] Correct covariance matrix: [[1.29 1.7 ] [1.7 4.67]]

📊

Quick Reference

Summary tips for using numpy.cov():

Input data shape matters: use rowvar to specify variable orientation.
Output is a covariance matrix showing covariance between each pair of variables.
Default normalization uses N-1 (sample covariance).
Use bias=True for population covariance (normalizing by N).

✅

Key Takeaways

Use numpy.cov() to compute covariance matrix between data arrays easily.

Set rowvar=False if your variables are in columns, not rows.

Covariance matrix shows how variables vary together; diagonal is variance.

Default normalization divides by N-1 for sample covariance.

Check data shape and parameters to avoid incorrect results.