0
0
Data-analysis-pythonHow-ToBeginner ยท 3 min read

How to Visualize Distribution in Python: Simple Guide

To visualize distribution in Python, use matplotlib or seaborn libraries to create plots like histograms, boxplots, or KDE plots. These plots help you see how data values spread and cluster.
๐Ÿ“

Syntax

Here are common ways to visualize distribution using matplotlib and seaborn:

  • plt.hist(data, bins=number): Draws a histogram showing frequency of data ranges.
  • sns.boxplot(x=data): Creates a boxplot showing median, quartiles, and outliers.
  • sns.kdeplot(data): Draws a smooth curve estimating data density.
python
import matplotlib.pyplot as plt
import seaborn as sns

data = [1, 2, 2, 3, 3, 3, 4, 4, 5]

# Histogram
plt.hist(data, bins=5)
plt.show()

# Boxplot
sns.boxplot(x=data)
plt.show()

# KDE plot
sns.kdeplot(data)
plt.show()
๐Ÿ’ป

Example

This example shows how to plot a histogram and a KDE plot for a list of numbers. It helps you see how often values appear and the smooth shape of their distribution.

python
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Generate random data from normal distribution
np.random.seed(0)
data = np.random.normal(loc=0, scale=1, size=1000)

# Histogram
plt.hist(data, bins=30, color='skyblue', edgecolor='black')
plt.title('Histogram of Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

# KDE plot
sns.kdeplot(data, shade=True, color='orange')
plt.title('KDE Plot of Data')
plt.xlabel('Value')
plt.ylabel('Density')
plt.show()
โš ๏ธ

Common Pitfalls

Common mistakes when visualizing distribution include:

  • Using too few or too many bins in histograms, which can hide or exaggerate patterns.
  • Not labeling axes or titles, making plots hard to understand.
  • Confusing histogram frequency with probability density; KDE plots show density, histograms show counts.
  • Ignoring outliers that can skew the visualization.
python
import matplotlib.pyplot as plt
import seaborn as sns

data = [1, 2, 2, 3, 3, 3, 4, 4, 5]

# Wrong: too few bins hides detail
plt.hist(data, bins=1)
plt.title('Too Few Bins')
plt.show()

# Right: reasonable bins show detail
plt.hist(data, bins=5)
plt.title('Good Number of Bins')
plt.show()
๐Ÿ“Š

Quick Reference

Tips for visualizing distribution in Python:

  • Use plt.hist() for simple frequency histograms.
  • Use sns.boxplot() to see spread and outliers.
  • Use sns.kdeplot() for smooth density curves.
  • Adjust bins in histograms to balance detail and clarity.
  • Always label your plots for clarity.
โœ…

Key Takeaways

Use matplotlib and seaborn to create histograms, boxplots, and KDE plots for distribution visualization.
Adjust histogram bins carefully to reveal meaningful data patterns.
Label your plots clearly to make them easy to understand.
KDE plots show smooth density estimates, while histograms show counts.
Watch out for outliers as they can affect your distribution visualization.