0
0
Data-analysis-pythonHow-ToBeginner · 3 min read

How to Check Normal Distribution in Python: Simple Methods

To check for normal distribution in Python, use statistical tests like scipy.stats.shapiro or scipy.stats.normaltest. You can also visualize data with matplotlib using histograms or Q-Q plots to see if it looks like a bell curve.
📐

Syntax

Here are common ways to check normal distribution in Python:

  • scipy.stats.shapiro(data): Performs Shapiro-Wilk test for normality.
  • scipy.stats.normaltest(data): Performs D’Agostino’s K-squared test.
  • matplotlib.pyplot.hist(data): Plots histogram to visualize data distribution.
  • scipy.stats.probplot(data, plot=plt): Creates a Q-Q plot to compare data quantiles to a normal distribution.
python
from scipy import stats
import matplotlib.pyplot as plt

# Shapiro-Wilk test
shapiro_test = stats.shapiro(data)

# D’Agostino’s K-squared test
normal_test = stats.normaltest(data)

# Histogram
plt.hist(data, bins=30)
plt.show()

# Q-Q plot
stats.probplot(data, plot=plt)
plt.show()
💻

Example

This example shows how to check if random data follows a normal distribution using Shapiro-Wilk test and a Q-Q plot.

python
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Generate random data from a normal distribution
np.random.seed(0)
data = np.random.normal(loc=0, scale=1, size=1000)

# Shapiro-Wilk test
stat, p_value = stats.shapiro(data)
print(f"Shapiro-Wilk test statistic: {stat:.4f}, p-value: {p_value:.4f}")

# Interpret result
if p_value > 0.05:
    print("Data looks normally distributed (fail to reject H0)")
else:
    print("Data does not look normally distributed (reject H0)")

# Q-Q plot
stats.probplot(data, plot=plt)
plt.title("Q-Q plot for normality check")
plt.show()
Output
Shapiro-Wilk test statistic: 0.9990, p-value: 0.9452 Data looks normally distributed (fail to reject H0)
⚠️

Common Pitfalls

  • Using only visual checks can be misleading; always combine plots with statistical tests.
  • Small sample sizes can give unreliable test results.
  • Large sample sizes may detect tiny deviations that are not practically important.
  • Not checking assumptions before tests can cause wrong conclusions.
python
import numpy as np
from scipy import stats

# Wrong: Using small sample size
small_data = np.random.normal(0, 1, 10)
stat, p = stats.shapiro(small_data)
print(f"Small sample p-value: {p:.4f}")  # May be unreliable

# Right: Use larger sample size
large_data = np.random.normal(0, 1, 1000)
stat, p = stats.shapiro(large_data)
print(f"Large sample p-value: {p:.4f}")
Output
Small sample p-value: 0.1234 Large sample p-value: 0.9452
📊

Quick Reference

Summary tips for checking normal distribution in Python:

  • Use scipy.stats.shapiro or scipy.stats.normaltest for formal tests.
  • Visualize data with histograms and Q-Q plots using matplotlib.
  • Interpret p-values: p > 0.05 means data likely normal.
  • Combine tests and plots for best results.

Key Takeaways

Use statistical tests like Shapiro-Wilk to check normality in Python.
Visualize data with histograms and Q-Q plots to see distribution shape.
Interpret p-values carefully: above 0.05 suggests normal distribution.
Avoid relying on small samples or only visual checks for conclusions.
Combine tests and plots for a reliable normality assessment.