0
0
SciPydata~10 mins

Pearson correlation in SciPy - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Pearson correlation
Start with two data lists
Calculate means of both lists
Calculate deviations from means
Multiply deviations pairwise
Sum products and calculate covariance
Calculate standard deviations
Divide covariance by product of std devs
Output Pearson correlation coefficient
Pearson correlation measures how two sets of numbers move together, from -1 (opposite) to 1 (same).
Execution Sample
SciPy
from scipy.stats import pearsonr

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

corr, p_value = pearsonr(x, y)
print(corr)
Calculate Pearson correlation between two lists x and y.
Execution Table
StepActionIntermediate ValuesResult
1Calculate mean of xmean_x = (1+2+3+4+5)/5 = 3.0mean_x = 3.0
2Calculate mean of ymean_y = (2+4+6+8+10)/5 = 6.0mean_y = 6.0
3Calculate deviations x - mean_x[-2, -1, 0, 1, 2]deviations_x = [-2, -1, 0, 1, 2]
4Calculate deviations y - mean_y[-4, -2, 0, 2, 4]deviations_y = [-4, -2, 0, 2, 4]
5Multiply deviations pairwise[-2*-4, -1*-2, 0*0, 1*2, 2*4] = [8, 2, 0, 2, 8]products = [8, 2, 0, 2, 8]
6Sum products8 + 2 + 0 + 2 + 8 = 20sum_products = 20
7Calculate covariancecov = sum_products / (5 - 1) = 20 / 4 = 5.0covariance = 5.0
8Calculate std dev of xsqrt(sum((x_i - mean_x)^2) / (5 - 1)) = sqrt(10 / 4) = 1.5811std_x = 1.5811
9Calculate std dev of ysqrt(sum((y_i - mean_y)^2) / (5 - 1)) = sqrt(40 / 4) = 3.1623std_y = 3.1623
10Calculate Pearson correlationcovariance / (std_x * std_y) = 5.0 / (1.5811 * 3.1623) = 1.0correlation = 1.0
11Output correlationcorrelation = 1.01.0
💡 All data processed, Pearson correlation calculated as 1.0
Variable Tracker
VariableStartAfter Step 3After Step 4After Step 5After Step 6After Step 7After Step 8After Step 9After Step 10Final
x[1,2,3,4,5][-2, -1, 0, 1, 2][-2, -1, 0, 1, 2][-2, -1, 0, 1, 2][-2, -1, 0, 1, 2][-2, -1, 0, 1, 2][-2, -1, 0, 1, 2][-2, -1, 0, 1, 2][-2, -1, 0, 1, 2][-2, -1, 0, 1, 2]
y[2,4,6,8,10][2,4,6,8,10][-4, -2, 0, 2, 4][-4, -2, 0, 2, 4][-4, -2, 0, 2, 4][-4, -2, 0, 2, 4][-4, -2, 0, 2, 4][-4, -2, 0, 2, 4][-4, -2, 0, 2, 4][-4, -2, 0, 2, 4]
productsN/AN/AN/A[8, 2, 0, 2, 8][8, 2, 0, 2, 8][8, 2, 0, 2, 8][8, 2, 0, 2, 8][8, 2, 0, 2, 8][8, 2, 0, 2, 8][8, 2, 0, 2, 8]
sum_productsN/AN/AN/AN/A202020202020
covarianceN/AN/AN/AN/AN/A5.05.05.05.05.0
std_xN/AN/AN/AN/AN/AN/A1.58111.58111.58111.5811
std_yN/AN/AN/AN/AN/AN/AN/A3.16233.16233.1623
correlationN/AN/AN/AN/AN/AN/AN/AN/A1.01.0
Key Moments - 3 Insights
Why do we divide by (n-1) instead of n when calculating covariance and standard deviation?
Dividing by (n-1) gives an unbiased estimate of variance and covariance for sample data, as shown in steps 7, 8, and 9 of the execution_table.
Why is the Pearson correlation exactly 1.0 in this example?
Because y is exactly twice x, their deviations move perfectly together, making the correlation 1.0 as shown in step 10.
What do the deviations represent in steps 3 and 4?
Deviations show how far each value is from the mean, which helps measure how x and y move relative to their averages.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 5, what is the product of deviations for the third pair (x=3, y=6)?
A3
B0
C6
D-3
💡 Hint
Check the 'Multiply deviations pairwise' row in execution_table step 5.
At which step does the calculation of standard deviation of y happen?
AStep 9
BStep 8
CStep 7
DStep 10
💡 Hint
Look for 'Calculate std dev of y' in execution_table.
If the values in y were all the same, how would the Pearson correlation change?
AIt would be 1.0
BIt would be 0
CIt would be undefined or cause an error
D-1.0
💡 Hint
Recall that standard deviation of y would be zero, causing division by zero in step 10.
Concept Snapshot
Pearson correlation measures linear relationship between two numeric lists.
Use scipy.stats.pearsonr(x, y) to get correlation and p-value.
Returns value between -1 (opposite) and 1 (same direction).
Calculated as covariance divided by product of standard deviations.
Requires at least two data points in each list.
Full Transcript
Pearson correlation shows how two sets of numbers move together. We start by finding the average of each list. Then, for each number, we find how far it is from the average. We multiply these differences pair by pair and add them up. This sum helps us find covariance. We also find how spread out each list is using standard deviation. Finally, we divide covariance by the product of standard deviations to get the correlation number. This number tells us if the lists move together (close to 1), opposite (close to -1), or not related (close to 0).