Why histograms show distributions in Matplotlib - Performance Analysis
We want to understand how the time to create a histogram changes as we add more data points.
How does the work grow when the input data size grows?
Analyze the time complexity of the following code snippet.
import matplotlib.pyplot as plt
import numpy as np
n = 1000 # example number of data points
data = np.random.randn(n) # n data points
plt.hist(data, bins=10)
plt.show()
This code creates a histogram with 10 bins from n random data points.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Assigning each of the n data points to one of the 10 bins.
- How many times: Once for each data point, so n times.
As we add more data points, the number of operations grows roughly the same as the number of points.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 |
| 100 | 100 |
| 1000 | 1000 |
Pattern observation: The work grows linearly with the number of data points.
Time Complexity: O(n)
This means the time to build the histogram grows directly with the number of data points.
[X] Wrong: "The number of bins affects the time complexity a lot."
[OK] Correct: The number of bins is usually fixed and small, so it does not grow with input size and does not affect the main time cost.
Understanding how histogram creation scales helps you explain data visualization performance clearly and confidently.
"What if the number of bins grew with the number of data points? How would the time complexity change?"