How to Visualize Distribution in Python: Simple Guide
To visualize distribution in Python, use
matplotlib or seaborn libraries to create plots like histograms, boxplots, or KDE plots. These plots help you see how data values spread and cluster.Syntax
Here are common ways to visualize distribution using matplotlib and seaborn:
plt.hist(data, bins=number): Draws a histogram showing frequency of data ranges.sns.boxplot(x=data): Creates a boxplot showing median, quartiles, and outliers.sns.kdeplot(data): Draws a smooth curve estimating data density.
python
import matplotlib.pyplot as plt import seaborn as sns data = [1, 2, 2, 3, 3, 3, 4, 4, 5] # Histogram plt.hist(data, bins=5) plt.show() # Boxplot sns.boxplot(x=data) plt.show() # KDE plot sns.kdeplot(data) plt.show()
Example
This example shows how to plot a histogram and a KDE plot for a list of numbers. It helps you see how often values appear and the smooth shape of their distribution.
python
import matplotlib.pyplot as plt import seaborn as sns import numpy as np # Generate random data from normal distribution np.random.seed(0) data = np.random.normal(loc=0, scale=1, size=1000) # Histogram plt.hist(data, bins=30, color='skyblue', edgecolor='black') plt.title('Histogram of Data') plt.xlabel('Value') plt.ylabel('Frequency') plt.show() # KDE plot sns.kdeplot(data, shade=True, color='orange') plt.title('KDE Plot of Data') plt.xlabel('Value') plt.ylabel('Density') plt.show()
Common Pitfalls
Common mistakes when visualizing distribution include:
- Using too few or too many bins in histograms, which can hide or exaggerate patterns.
- Not labeling axes or titles, making plots hard to understand.
- Confusing histogram frequency with probability density; KDE plots show density, histograms show counts.
- Ignoring outliers that can skew the visualization.
python
import matplotlib.pyplot as plt import seaborn as sns data = [1, 2, 2, 3, 3, 3, 4, 4, 5] # Wrong: too few bins hides detail plt.hist(data, bins=1) plt.title('Too Few Bins') plt.show() # Right: reasonable bins show detail plt.hist(data, bins=5) plt.title('Good Number of Bins') plt.show()
Quick Reference
Tips for visualizing distribution in Python:
- Use
plt.hist()for simple frequency histograms. - Use
sns.boxplot()to see spread and outliers. - Use
sns.kdeplot()for smooth density curves. - Adjust
binsin histograms to balance detail and clarity. - Always label your plots for clarity.
Key Takeaways
Use matplotlib and seaborn to create histograms, boxplots, and KDE plots for distribution visualization.
Adjust histogram bins carefully to reveal meaningful data patterns.
Label your plots clearly to make them easy to understand.
KDE plots show smooth density estimates, while histograms show counts.
Watch out for outliers as they can affect your distribution visualization.