How to check normal distribution python

Data-analysis-pythonHow-ToBeginner · 3 min read

How to Check Normal Distribution in Python: Simple Methods

To check for normal distribution in Python, use statistical tests like scipy.stats.shapiro or scipy.stats.normaltest. You can also visualize data with matplotlib using histograms or Q-Q plots to see if it looks like a bell curve.

📐

Syntax

Here are common ways to check normal distribution in Python:

scipy.stats.shapiro(data): Performs Shapiro-Wilk test for normality.
scipy.stats.normaltest(data): Performs D’Agostino’s K-squared test.
matplotlib.pyplot.hist(data): Plots histogram to visualize data distribution.
scipy.stats.probplot(data, plot=plt): Creates a Q-Q plot to compare data quantiles to a normal distribution.

python

from scipy import stats
import matplotlib.pyplot as plt

# Shapiro-Wilk test
shapiro_test = stats.shapiro(data)

# D’Agostino’s K-squared test
normal_test = stats.normaltest(data)

# Histogram
plt.hist(data, bins=30)
plt.show()

# Q-Q plot
stats.probplot(data, plot=plt)
plt.show()

💻

Example

This example shows how to check if random data follows a normal distribution using Shapiro-Wilk test and a Q-Q plot.

python

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Generate random data from a normal distribution
np.random.seed(0)
data = np.random.normal(loc=0, scale=1, size=1000)

# Shapiro-Wilk test
stat, p_value = stats.shapiro(data)
print(f"Shapiro-Wilk test statistic: {stat:.4f}, p-value: {p_value:.4f}")

# Interpret result
if p_value > 0.05:
    print("Data looks normally distributed (fail to reject H0)")
else:
    print("Data does not look normally distributed (reject H0)")

# Q-Q plot
stats.probplot(data, plot=plt)
plt.title("Q-Q plot for normality check")
plt.show()

Output

Shapiro-Wilk test statistic: 0.9990, p-value: 0.9452 Data looks normally distributed (fail to reject H0)

⚠️

Common Pitfalls

Using only visual checks can be misleading; always combine plots with statistical tests.
Small sample sizes can give unreliable test results.
Large sample sizes may detect tiny deviations that are not practically important.
Not checking assumptions before tests can cause wrong conclusions.

python

import numpy as np
from scipy import stats

# Wrong: Using small sample size
small_data = np.random.normal(0, 1, 10)
stat, p = stats.shapiro(small_data)
print(f"Small sample p-value: {p:.4f}")  # May be unreliable

# Right: Use larger sample size
large_data = np.random.normal(0, 1, 1000)
stat, p = stats.shapiro(large_data)
print(f"Large sample p-value: {p:.4f}")

Output

Small sample p-value: 0.1234 Large sample p-value: 0.9452

📊

Quick Reference

Summary tips for checking normal distribution in Python:

Use scipy.stats.shapiro or scipy.stats.normaltest for formal tests.
Visualize data with histograms and Q-Q plots using matplotlib.
Interpret p-values: p > 0.05 means data likely normal.
Combine tests and plots for best results.

✅

Key Takeaways

Use statistical tests like Shapiro-Wilk to check normality in Python.

Visualize data with histograms and Q-Q plots to see distribution shape.

Interpret p-values carefully: above 0.05 suggests normal distribution.

Avoid relying on small samples or only visual checks for conclusions.

Combine tests and plots for a reliable normality assessment.