0
0
Data Analysis Pythondata~5 mins

Correlation analysis (Pearson, Spearman) in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Correlation analysis (Pearson, Spearman)
O(n log n)
Understanding Time Complexity

We want to know how the time needed to calculate correlation changes as the data grows.

How does the work increase when we have more data points?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd
from scipy.stats import pearsonr, spearmanr

def compute_correlations(df, col1, col2):
    pearson_corr, _ = pearsonr(df[col1], df[col2])
    spearman_corr, _ = spearmanr(df[col1], df[col2])
    return pearson_corr, spearman_corr

This code calculates Pearson and Spearman correlation between two columns in a data table.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Pearson: Single passes over data to compute means, std devs, covariance: O(n).
  • Spearman: Sorting to compute ranks: O(n log n), followed by linear correlation on ranks: O(n).
  • Dominant operation: Sorting in Spearman.
How Execution Grows With Input

As the number of data points n increases, the time grows as O(n log n), dominated by sorting for Spearman ranks.

Input Size (n)Approx. Operations
10About 100 (linear passes + sorting ~2×n log n)
100About 1,500 (linear + sorting)
1000About 20,000 (linear + sorting)

Pattern observation: Grows faster than linear due to log n factor in sorting, but still very efficient.

Final Time Complexity

Time Complexity: O(n log n)

This is dominated by the sorting step required for Spearman ranking; Pearson is O(n).

Common Mistake

[X] Wrong: "Correlation calculation time grows much faster than the data size because it compares every pair of points."

[OK] Correct: No pairwise comparisons (that would be O(n2)). Pearson uses O(n) summary statistics; Spearman uses O(n log n) sorting for ranks.

Interview Connect

Understanding how correlation calculations scale helps you explain data analysis efficiency clearly and confidently.

Self-Check

"What if we wanted to compute correlations between every pair of columns in a dataset? How would the time complexity change?"