Scatter plots in Pandas - Time & Space Complexity
We want to understand how the time to create a scatter plot changes as the data size grows.
How does plotting many points affect the time it takes?
Analyze the time complexity of the following code snippet.
import pandas as pd
import matplotlib.pyplot as plt
n = 1000 # example size
data = pd.DataFrame({
'x': range(n),
'y': range(n)
})
plt.scatter(data['x'], data['y'])
plt.show()
This code creates a scatter plot of n points using pandas and matplotlib.
- Primary operation: Plotting each point on the scatter plot.
- How many times: Once for each of the n points in the data.
As the number of points increases, the time to plot grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 operations (plotting 10 points) |
| 100 | 100 operations (plotting 100 points) |
| 1000 | 1000 operations (plotting 1000 points) |
Pattern observation: The time grows linearly as the number of points increases.
Time Complexity: O(n)
This means the time to create the scatter plot grows directly with the number of points.
[X] Wrong: "Plotting a scatter plot takes the same time no matter how many points there are."
[OK] Correct: Each point must be drawn, so more points mean more work and more time.
Understanding how plotting time grows helps you explain performance when working with large datasets in real projects.
"What if we used a sampling method to plot only a fraction of points? How would the time complexity change?"