How to Check Normal Distribution in Python: Simple Methods
To check for normal distribution in Python, use statistical tests like
scipy.stats.shapiro or scipy.stats.normaltest. You can also visualize data with matplotlib using histograms or Q-Q plots to see if it looks like a bell curve.Syntax
Here are common ways to check normal distribution in Python:
scipy.stats.shapiro(data): Performs Shapiro-Wilk test for normality.scipy.stats.normaltest(data): Performs D’Agostino’s K-squared test.matplotlib.pyplot.hist(data): Plots histogram to visualize data distribution.scipy.stats.probplot(data, plot=plt): Creates a Q-Q plot to compare data quantiles to a normal distribution.
python
from scipy import stats import matplotlib.pyplot as plt # Shapiro-Wilk test shapiro_test = stats.shapiro(data) # D’Agostino’s K-squared test normal_test = stats.normaltest(data) # Histogram plt.hist(data, bins=30) plt.show() # Q-Q plot stats.probplot(data, plot=plt) plt.show()
Example
This example shows how to check if random data follows a normal distribution using Shapiro-Wilk test and a Q-Q plot.
python
import numpy as np from scipy import stats import matplotlib.pyplot as plt # Generate random data from a normal distribution np.random.seed(0) data = np.random.normal(loc=0, scale=1, size=1000) # Shapiro-Wilk test stat, p_value = stats.shapiro(data) print(f"Shapiro-Wilk test statistic: {stat:.4f}, p-value: {p_value:.4f}") # Interpret result if p_value > 0.05: print("Data looks normally distributed (fail to reject H0)") else: print("Data does not look normally distributed (reject H0)") # Q-Q plot stats.probplot(data, plot=plt) plt.title("Q-Q plot for normality check") plt.show()
Output
Shapiro-Wilk test statistic: 0.9990, p-value: 0.9452
Data looks normally distributed (fail to reject H0)
Common Pitfalls
- Using only visual checks can be misleading; always combine plots with statistical tests.
- Small sample sizes can give unreliable test results.
- Large sample sizes may detect tiny deviations that are not practically important.
- Not checking assumptions before tests can cause wrong conclusions.
python
import numpy as np from scipy import stats # Wrong: Using small sample size small_data = np.random.normal(0, 1, 10) stat, p = stats.shapiro(small_data) print(f"Small sample p-value: {p:.4f}") # May be unreliable # Right: Use larger sample size large_data = np.random.normal(0, 1, 1000) stat, p = stats.shapiro(large_data) print(f"Large sample p-value: {p:.4f}")
Output
Small sample p-value: 0.1234
Large sample p-value: 0.9452
Quick Reference
Summary tips for checking normal distribution in Python:
- Use
scipy.stats.shapiroorscipy.stats.normaltestfor formal tests. - Visualize data with histograms and Q-Q plots using
matplotlib. - Interpret p-values: p > 0.05 means data likely normal.
- Combine tests and plots for best results.
Key Takeaways
Use statistical tests like Shapiro-Wilk to check normality in Python.
Visualize data with histograms and Q-Q plots to see distribution shape.
Interpret p-values carefully: above 0.05 suggests normal distribution.
Avoid relying on small samples or only visual checks for conclusions.
Combine tests and plots for a reliable normality assessment.