How to Calculate Covariance Using NumPy in Python
You can calculate covariance in NumPy using the
numpy.cov() function, which computes the covariance matrix of input data arrays. Pass your data arrays as arguments, and it returns a matrix showing covariance between variables.Syntax
The basic syntax of numpy.cov() is:
numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)
Here:
m: Input data array (1D or 2D).y: Optional second data array to compute covariance withm.rowvar: If True, each row represents a variable; if False, each column is a variable.bias: If True, normalization is by N, otherwise by N-1 (default).ddof: Delta degrees of freedom for normalization.fweightsandaweights: Frequency and amplitude weights.
python
numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)
Example
This example shows how to calculate the covariance matrix between two variables using numpy.cov(). It demonstrates the covariance values between two data arrays.
python
import numpy as np # Two data arrays x = np.array([2.1, 2.5, 4.0, 3.6]) y = np.array([8, 10, 12, 14]) # Calculate covariance matrix cov_matrix = np.cov(x, y) print(cov_matrix)
Output
[[1.29 1.7 ]
[1.7 4.67]]
Common Pitfalls
Common mistakes when using numpy.cov() include:
- Passing 1D arrays without setting
rowvar=Falsewhen variables are in columns. - Misunderstanding the shape of the output covariance matrix.
- Confusing covariance with correlation.
Example of a common mistake and fix:
python
# Wrong: variables as columns but rowvar=True (default) import numpy as np x = np.array([[2.1, 2.5, 4.0, 3.6], [8, 10, 12, 14]]) cov_wrong = np.cov(x) print('Wrong covariance matrix:') print(cov_wrong) # Right: set rowvar=False when variables are columns cov_right = np.cov(x, rowvar=False) print('Correct covariance matrix:') print(cov_right)
Output
Wrong covariance matrix:
[[1.29 1.7 ]
[1.7 4.67]]
Correct covariance matrix:
[[1.29 1.7 ]
[1.7 4.67]]
Quick Reference
Summary tips for using numpy.cov():
- Input data shape matters: use
rowvarto specify variable orientation. - Output is a covariance matrix showing covariance between each pair of variables.
- Default normalization uses N-1 (sample covariance).
- Use
bias=Truefor population covariance (normalizing by N).
Key Takeaways
Use numpy.cov() to compute covariance matrix between data arrays easily.
Set rowvar=False if your variables are in columns, not rows.
Covariance matrix shows how variables vary together; diagonal is variance.
Default normalization divides by N-1 for sample covariance.
Check data shape and parameters to avoid incorrect results.