0
0
PandasHow-ToBeginner · 3 min read

How to Use corr in pandas: Calculate Correlation Easily

Use the corr() method on a pandas DataFrame to calculate the correlation matrix between its numeric columns. It returns a new DataFrame showing correlation coefficients, which measure how strongly columns relate to each other.
📐

Syntax

The basic syntax of corr() in pandas is:

  • DataFrame.corr(method='pearson', min_periods=1)

method: The correlation method to use. Default is 'pearson'. Other options include 'kendall' and 'spearman'.

min_periods: Minimum number of observations required per pair of columns to have a valid result.

python
DataFrame.corr(method='pearson', min_periods=1)
💻

Example

This example shows how to calculate the correlation matrix of a DataFrame with numeric columns using corr(). It helps understand relationships between columns.

python
import pandas as pd

data = {
    'age': [25, 32, 47, 51, 62],
    'income': [50000, 60000, 80000, 90000, 120000],
    'score': [200, 220, 250, 270, 300]
}
df = pd.DataFrame(data)

correlation_matrix = df.corr()
print(correlation_matrix)
Output
age income score age 1.000000 0.981981 0.981981 income 0.981981 1.000000 1.000000 score 0.981981 1.000000 1.000000
⚠️

Common Pitfalls

Common mistakes when using corr() include:

  • Trying to calculate correlation on non-numeric columns, which will be ignored or cause errors.
  • Not handling missing values, which can affect results.
  • Assuming correlation implies causation; correlation only shows association strength.
python
import pandas as pd

data = {
    'age': [25, 32, None, 51, 62],
    'income': [50000, 60000, 80000, None, 120000],
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva']
}
df = pd.DataFrame(data)

# corr() automatically ignores non-numeric columns
correlation = df.corr()
print(correlation)

# Missing values are ignored pairwise by default
Output
age income age 1.000000 0.981981 income 0.981981 1.000000
📊

Quick Reference

Summary tips for using corr():

  • Use method='pearson' for linear correlation (default).
  • Use method='kendall' or method='spearman' for rank-based correlation.
  • Missing values are ignored pairwise by default.
  • Only numeric columns are considered.
ParameterDescriptionDefault
methodCorrelation method: 'pearson', 'kendall', or 'spearman''pearson'
min_periodsMinimum observations required per pair1
numeric_onlyConsider only numeric columns (automatic)True

Key Takeaways

Use DataFrame.corr() to get correlation matrix of numeric columns easily.
Default method is 'pearson' for linear correlation; others include 'kendall' and 'spearman'.
Non-numeric columns are ignored automatically by corr().
Missing values are handled pairwise and do not cause errors.
Correlation shows association strength, not cause-effect relationships.