0
0
Data Analysis Pythondata~5 mins

Correlation analysis (Pearson, Spearman) in Data Analysis Python

Choose your learning style9 modes available
Introduction

Correlation analysis helps us find out how two things change together. It tells if one thing goes up when the other goes up or down.

To check if study time and exam scores are related.
To see if temperature and ice cream sales move together.
To find if height and weight have a connection.
To understand if advertising budget affects product sales.
Syntax
Data Analysis Python
from scipy.stats import pearsonr, spearmanr

# For Pearson correlation
correlation, p_value = pearsonr(data1, data2)

# For Spearman correlation
correlation, p_value = spearmanr(data1, data2)

pearsonr measures linear relationship between two numeric sets.

spearmanr measures monotonic relationship using ranks, good for non-linear or ordinal data.

Examples
Calculates Pearson correlation for two lists with perfect positive linear relation.
Data Analysis Python
from scipy.stats import pearsonr

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
correlation, p_value = pearsonr(x, y)
print(correlation)
Calculates Spearman correlation for data that may not be perfectly linear but has a monotonic trend.
Data Analysis Python
from scipy.stats import spearmanr

x = [1, 2, 3, 4, 5]
y = [5, 6, 7, 8, 7]
correlation, p_value = spearmanr(x, y)
print(correlation)
Sample Program

This program calculates both Pearson and Spearman correlations between hours studied and exam scores to see how strongly they are related.

Data Analysis Python
from scipy.stats import pearsonr, spearmanr

# Sample data: hours studied and exam scores
hours_studied = [1, 2, 3, 4, 5, 6, 7, 8]
exam_scores = [50, 55, 65, 70, 75, 80, 85, 90]

# Calculate Pearson correlation
pearson_corr, pearson_p = pearsonr(hours_studied, exam_scores)

# Calculate Spearman correlation
spearman_corr, spearman_p = spearmanr(hours_studied, exam_scores)

print(f"Pearson correlation: {pearson_corr:.2f}")
print(f"Spearman correlation: {spearman_corr:.2f}")
OutputSuccess
Important Notes

Correlation values range from -1 to 1. Close to 1 means strong positive relation, close to -1 means strong negative relation, and near 0 means no relation.

Use Pearson when data is linear and numeric. Use Spearman when data is not linear or has ranks.

Always check p-value to see if correlation is statistically significant (usually p < 0.05).

Summary

Correlation analysis shows how two variables move together.

Pearson measures linear relationships; Spearman measures monotonic relationships.

Both give a correlation value and a p-value to check significance.