Correlation matrix visualization in Matplotlib - Time & Space Complexity
We want to understand how the time needed to create a correlation matrix visualization changes as the data size grows.
How does the number of variables affect the work matplotlib does to draw the matrix?
Analyze the time complexity of the following code snippet.
import matplotlib.pyplot as plt
import numpy as np
# Generate random data
np.random.seed(0)
data = np.random.rand(100, 5) # 100 rows, 5 variables
# Compute correlation matrix
corr = np.corrcoef(data, rowvar=False)
# Plot correlation matrix
plt.imshow(corr, cmap='coolwarm', vmin=-1, vmax=1)
plt.colorbar()
plt.show()
This code creates a correlation matrix from 5 variables and shows it as a colored grid.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating correlations between each pair of variables.
- How many times: For n variables, correlations are computed for each pair, about n x n times.
As the number of variables increases, the number of pairs grows quickly.
| Input Size (n) | Approx. Operations |
|---|---|
| 5 | 25 |
| 10 | 100 |
| 100 | 10,000 |
Pattern observation: The operations grow roughly by the square of the number of variables.
Time Complexity: O(n2)
This means if you double the number of variables, the work to compute and draw the matrix roughly quadruples.
[X] Wrong: "The time to create the correlation matrix grows linearly with the number of variables."
[OK] Correct: Because each variable pairs with every other variable, the number of pairs grows much faster than the number of variables alone.
Understanding how visualization time grows with data size helps you explain performance and scalability clearly in real projects.
"What if we only visualize the upper triangle of the correlation matrix? How would the time complexity change?"