Multiple histograms overlay in Matplotlib - Time & Space Complexity
We want to understand how the time to draw multiple histograms changes as we add more data.
How does the work grow when overlaying several histograms in one plot?
Analyze the time complexity of the following code snippet.
import matplotlib.pyplot as plt
import numpy as np
# Generate data
np.random.seed(0)
data1 = np.random.normal(0, 1, 1000)
data2 = np.random.normal(1, 1.5, 1000)
# Plot two histograms overlaid
plt.hist(data1, bins=30, alpha=0.5)
plt.hist(data2, bins=30, alpha=0.5)
plt.show()
This code creates two histograms from two datasets and draws them on the same plot with some transparency.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Counting how many data points fall into each bin for each histogram.
- How many times: For each histogram, the data array is scanned once to count bins.
- Since there are two histograms, this counting happens twice.
As the number of data points grows, the counting work grows proportionally for each histogram.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | ~20 (2 histograms x 10 points each) |
| 100 | ~200 (2 x 100) |
| 1000 | ~2000 (2 x 1000) |
Pattern observation: The operations grow linearly with the number of data points and the number of histograms.
Time Complexity: O(k * n)
This means the time grows linearly with both the number of histograms (k) and the number of data points per histogram (n).
[X] Wrong: "Overlaying multiple histograms only takes the same time as one histogram because they share the plot area."
[OK] Correct: Each histogram requires scanning its own data to count bins, so the work adds up with each histogram.
Understanding how plotting multiple datasets affects performance helps you explain trade-offs in data visualization tasks.
What if we increased the number of bins instead of the number of histograms? How would the time complexity change?