Categorical scatter with jitter in Matplotlib - Time & Space Complexity
We want to understand how the time to draw a categorical scatter plot with jitter changes as we add more data points.
How does the plotting time grow when the number of points increases?
Analyze the time complexity of the following code snippet.
import matplotlib.pyplot as plt
import numpy as np
categories = ['A', 'B', 'C']
n = 1000
x = np.random.choice(categories, n)
y = np.random.randn(n)
jitter = (np.random.rand(n) - 0.5) * 0.1
plt.scatter([categories.index(cat) + jitter[i] for i, cat in enumerate(x)], y)
plt.show()
This code plots points for categories A, B, and C with a small random shift (jitter) to avoid overlap.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Loop over all data points to assign jitter and plot each point.
- How many times: Once for each of the n points.
Each additional point adds a small fixed amount of work to place and draw it.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 point placements and draws |
| 100 | About 100 point placements and draws |
| 1000 | About 1000 point placements and draws |
Pattern observation: The work grows directly with the number of points.
Time Complexity: O(n)
This means the time to create the plot grows in a straight line as you add more points.
[X] Wrong: "Adding jitter makes the plot take much longer because it's more complex than just plotting points."
[OK] Correct: The jitter is just a small calculation per point and does not add extra loops or nested work, so it does not change the overall growth pattern.
Understanding how plotting time grows helps you handle larger datasets smoothly and shows you can think about performance even in visualization tasks.
"What if we added a nested loop to compare each point to every other point for overlap detection? How would the time complexity change?"