Matplotlibdata~5 mins

Categorical scatter with jitter in Matplotlib - Time & Space Complexity

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Time Complexity: Categorical scatter with jitter

O(n)

Understanding Time Complexity

We want to understand how the time to draw a categorical scatter plot with jitter changes as we add more data points.

How does the plotting time grow when the number of points increases?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import matplotlib.pyplot as plt
import numpy as np

categories = ['A', 'B', 'C']
n = 1000
x = np.random.choice(categories, n)
y = np.random.randn(n)
jitter = (np.random.rand(n) - 0.5) * 0.1

plt.scatter([categories.index(cat) + jitter[i] for i, cat in enumerate(x)], y)
plt.show()

This code plots points for categories A, B, and C with a small random shift (jitter) to avoid overlap.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Loop over all data points to assign jitter and plot each point.
How many times: Once for each of the n points.

How Execution Grows With Input

Each additional point adds a small fixed amount of work to place and draw it.

Input Size (n)	Approx. Operations
10	About 10 point placements and draws
100	About 100 point placements and draws
1000	About 1000 point placements and draws

Pattern observation: The work grows directly with the number of points.

Final Time Complexity

Time Complexity: O(n)

This means the time to create the plot grows in a straight line as you add more points.

Common Mistake

[X] Wrong: "Adding jitter makes the plot take much longer because it's more complex than just plotting points."

[OK] Correct: The jitter is just a small calculation per point and does not add extra loops or nested work, so it does not change the overall growth pattern.

Interview Connect

Understanding how plotting time grows helps you handle larger datasets smoothly and shows you can think about performance even in visualization tasks.

Self-Check

"What if we added a nested loop to compare each point to every other point for overlap detection? How would the time complexity change?"