Correlation analysis helps us find out how two things change together. It tells if one thing goes up when the other goes up or down.
Correlation analysis (Pearson, Spearman) in Data Analysis Python
from scipy.stats import pearsonr, spearmanr # For Pearson correlation correlation, p_value = pearsonr(data1, data2) # For Spearman correlation correlation, p_value = spearmanr(data1, data2)
pearsonr measures linear relationship between two numeric sets.
spearmanr measures monotonic relationship using ranks, good for non-linear or ordinal data.
from scipy.stats import pearsonr x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] correlation, p_value = pearsonr(x, y) print(correlation)
from scipy.stats import spearmanr x = [1, 2, 3, 4, 5] y = [5, 6, 7, 8, 7] correlation, p_value = spearmanr(x, y) print(correlation)
This program calculates both Pearson and Spearman correlations between hours studied and exam scores to see how strongly they are related.
from scipy.stats import pearsonr, spearmanr # Sample data: hours studied and exam scores hours_studied = [1, 2, 3, 4, 5, 6, 7, 8] exam_scores = [50, 55, 65, 70, 75, 80, 85, 90] # Calculate Pearson correlation pearson_corr, pearson_p = pearsonr(hours_studied, exam_scores) # Calculate Spearman correlation spearman_corr, spearman_p = spearmanr(hours_studied, exam_scores) print(f"Pearson correlation: {pearson_corr:.2f}") print(f"Spearman correlation: {spearman_corr:.2f}")
Correlation values range from -1 to 1. Close to 1 means strong positive relation, close to -1 means strong negative relation, and near 0 means no relation.
Use Pearson when data is linear and numeric. Use Spearman when data is not linear or has ranks.
Always check p-value to see if correlation is statistically significant (usually p < 0.05).
Correlation analysis shows how two variables move together.
Pearson measures linear relationships; Spearman measures monotonic relationships.
Both give a correlation value and a p-value to check significance.