Box plot with plt.boxplot in Matplotlib - Time & Space Complexity
We want to understand how the time to create a box plot changes as the amount of data grows.
How does the plotting time increase when we add more data points?
Analyze the time complexity of the following code snippet.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
plt.boxplot(data)
plt.show()
This code creates a box plot for 1000 random data points using matplotlib.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating statistics (median, quartiles) by scanning the data array.
- How many times: Each data point is read once to compute these values.
As the number of data points increases, the time to compute the box plot statistics grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 reads and calculations |
| 100 | About 100 reads and calculations |
| 1000 | About 1000 reads and calculations |
Pattern observation: Doubling the data roughly doubles the work needed to compute the box plot.
Time Complexity: O(n)
This means the time to create the box plot grows linearly with the number of data points.
[X] Wrong: "Creating a box plot takes the same time no matter how much data there is."
[OK] Correct: The plot needs to read all data points to find medians and quartiles, so more data means more work.
Knowing how plotting time grows helps you understand performance when working with large datasets in data science projects.
"What if we used multiple box plots side by side for different groups? How would the time complexity change?"