How to Plot Correlation Heatmap in Python Easily
To plot a correlation heatmap in Python, use
pandas to calculate the correlation matrix and seaborn to visualize it with heatmap(). This shows how variables relate to each other in a colorful grid.Syntax
First, calculate the correlation matrix from your data using DataFrame.corr(). Then, pass this matrix to seaborn.heatmap() to create the heatmap.
df.corr(): Computes pairwise correlation of columns.sns.heatmap(data): Plots the heatmap of the given data.- Optional parameters like
annot=Trueadd numbers on the heatmap.
python
import pandas as pd import seaborn as sns # Assuming df is your DataFrame # Calculate correlation matrix corr_matrix = df.corr() # Plot heatmap sns.heatmap(corr_matrix, annot=True)
Example
This example shows how to create a correlation heatmap from a sample dataset using pandas and seaborn. It calculates correlations and displays them with colors and numbers.
python
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # Sample data data = { 'age': [25, 32, 47, 51, 62], 'income': [50000, 60000, 80000, 90000, 120000], 'score': [200, 220, 250, 270, 300] } df = pd.DataFrame(data) # Calculate correlation matrix corr = df.corr() # Plot heatmap sns.heatmap(corr, annot=True, cmap='coolwarm') plt.title('Correlation Heatmap') plt.show()
Output
A window opens showing a colored heatmap with correlation values between 'age', 'income', and 'score'.
Common Pitfalls
Common mistakes when plotting correlation heatmaps include:
- Not calculating the correlation matrix first and passing raw data to
heatmap(). - Forgetting to import
matplotlib.pyplotand callplt.show()to display the plot. - Using non-numeric data which causes errors in correlation calculation.
- Not setting
annot=Trueif you want to see correlation numbers on the heatmap.
python
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt data = {'name': ['Alice', 'Bob'], 'age': [25, 30]} df = pd.DataFrame(data) # Wrong: Passing raw data (non-numeric included) to heatmap # sns.heatmap(df) # This will raise an error # Right: Calculate correlation matrix first corr = df.select_dtypes(include='number').corr() sns.heatmap(corr, annot=True) plt.show()
Quick Reference
Tips for plotting correlation heatmaps:
- Use
df.corr()to get correlations. - Use
sns.heatmap()to visualize. - Set
annot=Trueto show numbers. - Choose color maps like
'coolwarm'for clear visuals. - Always import
matplotlib.pyplotand callplt.show()to display.
Key Takeaways
Calculate the correlation matrix with pandas before plotting.
Use seaborn's heatmap with annot=True to see correlation values.
Only numeric data can be used for correlation heatmaps.
Always call plt.show() to display the heatmap plot.
Choose color maps to make the heatmap easy to read.