0
0
Data-analysis-pythonHow-ToBeginner ยท 3 min read

How to Plot Correlation Heatmap in Python Easily

To plot a correlation heatmap in Python, use pandas to calculate the correlation matrix and seaborn to visualize it with heatmap(). This shows how variables relate to each other in a colorful grid.
๐Ÿ“

Syntax

First, calculate the correlation matrix from your data using DataFrame.corr(). Then, pass this matrix to seaborn.heatmap() to create the heatmap.

  • df.corr(): Computes pairwise correlation of columns.
  • sns.heatmap(data): Plots the heatmap of the given data.
  • Optional parameters like annot=True add numbers on the heatmap.
python
import pandas as pd
import seaborn as sns

# Assuming df is your DataFrame
# Calculate correlation matrix
corr_matrix = df.corr()

# Plot heatmap
sns.heatmap(corr_matrix, annot=True)
๐Ÿ’ป

Example

This example shows how to create a correlation heatmap from a sample dataset using pandas and seaborn. It calculates correlations and displays them with colors and numbers.

python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = {
    'age': [25, 32, 47, 51, 62],
    'income': [50000, 60000, 80000, 90000, 120000],
    'score': [200, 220, 250, 270, 300]
}

df = pd.DataFrame(data)

# Calculate correlation matrix
corr = df.corr()

# Plot heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
Output
A window opens showing a colored heatmap with correlation values between 'age', 'income', and 'score'.
โš ๏ธ

Common Pitfalls

Common mistakes when plotting correlation heatmaps include:

  • Not calculating the correlation matrix first and passing raw data to heatmap().
  • Forgetting to import matplotlib.pyplot and call plt.show() to display the plot.
  • Using non-numeric data which causes errors in correlation calculation.
  • Not setting annot=True if you want to see correlation numbers on the heatmap.
python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = {'name': ['Alice', 'Bob'], 'age': [25, 30]}
df = pd.DataFrame(data)

# Wrong: Passing raw data (non-numeric included) to heatmap
# sns.heatmap(df)  # This will raise an error

# Right: Calculate correlation matrix first
corr = df.select_dtypes(include='number').corr()
sns.heatmap(corr, annot=True)
plt.show()
๐Ÿ“Š

Quick Reference

Tips for plotting correlation heatmaps:

  • Use df.corr() to get correlations.
  • Use sns.heatmap() to visualize.
  • Set annot=True to show numbers.
  • Choose color maps like 'coolwarm' for clear visuals.
  • Always import matplotlib.pyplot and call plt.show() to display.
โœ…

Key Takeaways

Calculate the correlation matrix with pandas before plotting.
Use seaborn's heatmap with annot=True to see correlation values.
Only numeric data can be used for correlation heatmaps.
Always call plt.show() to display the heatmap plot.
Choose color maps to make the heatmap easy to read.