Trend lines on scatter plots in Matplotlib - Time & Space Complexity
We want to understand how the time to draw a trend line on a scatter plot changes as we add more points.
How does the work grow when the number of points increases?
Analyze the time complexity of the following code snippet.
import matplotlib.pyplot as plt
import numpy as np
n = 100 # Example value for n
x = np.random.rand(n)
y = np.random.rand(n)
plt.scatter(x, y)
coeffs = np.polyfit(x, y, 1)
plt.plot(x, coeffs[0] * x + coeffs[1])
plt.show()
This code creates a scatter plot of n points and fits a straight line (trend line) through them.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating the best-fit line using
np.polyfit, which processes all n points. - How many times: It examines all n points once to compute the line coefficients.
As the number of points n increases, the time to compute the trend line grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations |
| 100 | About 100 operations |
| 1000 | About 1000 operations |
Pattern observation: Doubling the points roughly doubles the work needed to find the trend line.
Time Complexity: O(n)
This means the time to compute the trend line grows linearly with the number of points.
[X] Wrong: "Adding more points won't affect the time much because the line is just one line."
[OK] Correct: Even though the line is simple, the calculation must consider every point to find the best fit, so more points mean more work.
Understanding how data size affects plotting and calculations helps you explain performance clearly and shows you think about efficiency in real tasks.
What if we changed the trend line to a polynomial of degree 3? How would the time complexity change?