0
0
SciPydata~5 mins

Pearson correlation in SciPy - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Pearson correlation
O(n)
Understanding Time Complexity

We want to understand how the time to calculate Pearson correlation changes as the size of the data grows.

How does the number of calculations increase when we have more data points?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import numpy as np
from scipy.stats import pearsonr

n = 1000  # example size
x = np.random.rand(n)
y = np.random.rand(n)

corr, p_value = pearsonr(x, y)
    

This code calculates the Pearson correlation coefficient between two arrays of length n.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Summation over all elements in the arrays to compute means, variances, and covariance.
  • How many times: Each element is visited once in these summations, so n times.
How Execution Grows With Input

As the number of data points n increases, the number of calculations grows roughly in direct proportion.

Input Size (n)Approx. Operations
10About 10 operations
100About 100 operations
1000About 1000 operations

Pattern observation: Doubling the data size roughly doubles the work needed.

Final Time Complexity

Time Complexity: O(n)

This means the time to compute Pearson correlation grows linearly with the number of data points.

Common Mistake

[X] Wrong: "Calculating Pearson correlation takes quadratic time because it compares every pair of points."

[OK] Correct: The calculation uses sums over the data arrays, not pairwise comparisons, so it only needs to look at each data point once.

Interview Connect

Understanding how Pearson correlation scales helps you explain performance when working with large datasets, a useful skill in data science roles.

Self-Check

"What if we calculated Pearson correlation for multiple pairs of arrays, each of length n? How would the time complexity change?"