Why statistical plots reveal data patterns in Matplotlib - Performance Analysis
We want to understand how the time to create statistical plots changes as the data size grows.
How does the plotting time increase when we add more data points?
Analyze the time complexity of the following code snippet.
import matplotlib.pyplot as plt
import numpy as np
n = 1000 # example number of data points
data = np.random.randn(n) # n data points
plt.hist(data, bins=30)
plt.show()
This code creates a histogram plot of n random data points using 30 bins.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Counting how many data points fall into each bin.
- How many times: Each of the n data points is checked once to find its bin.
As we add more data points, the time to count and place them in bins grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks |
| 100 | About 100 checks |
| 1000 | About 1000 checks |
Pattern observation: Doubling the data roughly doubles the work needed to create the plot.
Time Complexity: O(n)
This means the time to create the plot grows linearly with the number of data points.
[X] Wrong: "Adding more bins makes the plot take much longer because it loops over bins for each data point."
[OK] Correct: The main work is checking each data point once; the number of bins is usually fixed and small, so it does not multiply the work by n.
Understanding how plotting time grows helps you explain performance when working with large datasets and shows you can think about efficiency in data visualization.
"What if we increased the number of bins proportionally to the number of data points? How would the time complexity change?"