0
0
Data Analysis Pythondata~10 mins

Correlation analysis (Pearson, Spearman) in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Correlation analysis (Pearson, Spearman)
Start with two data sets
Choose correlation type
Pearson
Calculate correlation coefficient
Interpret strength and direction
End
We start with two sets of data, choose Pearson or Spearman correlation, calculate the coefficient, then interpret the result.
Execution Sample
Data Analysis Python
from scipy.stats import pearsonr, spearmanr

x = [1, 2, 3, 4, 5]
y = [5, 6, 7, 8, 9]

pearson_corr, _ = pearsonr(x, y)
spearman_corr, _ = spearmanr(x, y)

print(pearson_corr, spearman_corr)
This code calculates Pearson and Spearman correlation coefficients between two lists of numbers.
Execution Table
StepActionInput DataCalculationResult
1Prepare datax=[1,2,3,4,5], y=[5,6,7,8,9]NoneData ready
2Calculate Pearson correlationx, yCompute covariance and standard deviationsPearson r = 1.0
3Calculate Spearman correlationx, yRank data and compute Pearson on ranksSpearman rho = 1.0
4Print resultsPearson r=1.0, Spearman rho=1.0Output values1.0 1.0
5EndNoneNoneExecution complete
💡 All calculations done, results printed, execution ends
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
xNone[1,2,3,4,5][1,2,3,4,5][1,2,3,4,5][1,2,3,4,5]
yNone[5,6,7,8,9][5,6,7,8,9][5,6,7,8,9][5,6,7,8,9]
pearson_corrNoneNone1.01.01.0
spearman_corrNoneNoneNone1.01.0
Key Moments - 3 Insights
Why do Pearson and Spearman correlations give the same result here?
Because the data is already in a monotonic increasing order with no ties, ranking the data (Spearman) does not change the relationship, so both coefficients are equal as shown in steps 2 and 3.
What does a correlation coefficient of 1.0 mean?
It means a perfect positive relationship between x and y; as x increases, y tends to increase too, as seen in the results in step 4.
Why do we calculate Spearman correlation by ranking data?
Spearman correlation measures monotonic relationships by ranking data first, which makes it less sensitive to outliers and non-linear relationships, as explained in step 3.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the Pearson correlation coefficient after step 2?
A1.0
B0.5
C-0.9
D0.9
💡 Hint
Check the 'Result' column in row for step 2 in the execution_table.
At which step do we rank the data to calculate Spearman correlation?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look at the 'Calculation' column for step 3 in the execution_table.
If the data had many tied ranks, which correlation would be more reliable?
APearson
BSpearman
CBoth equal
DNeither
💡 Hint
Refer to the key_moments explanation about ranking and sensitivity to ties.
Concept Snapshot
Correlation analysis measures how two variables move together.
Pearson correlation measures linear relationships.
Spearman correlation measures monotonic relationships using ranks.
Values range from -1 (perfect negative) to 1 (perfect positive).
Use Pearson for linear data, Spearman for non-linear or ranked data.
Full Transcript
Correlation analysis compares two data sets to find how they relate. Pearson correlation checks if they change together in a straight line. Spearman correlation checks if one goes up when the other goes up, even if not straight. We start with data, pick Pearson or Spearman, calculate the number, then see if it's close to 1 or -1 to know if they are strongly related. In the example, both correlations are 1.0, showing a strong positive link.