Linear regression with np.polyfit() in NumPy - Time & Space Complexity
We want to understand how the time it takes to run linear regression with np.polyfit() changes as the data size grows.
Specifically, how does the work increase when we add more points to fit?
Analyze the time complexity of the following code snippet.
import numpy as np
n = 100 # example size
x = np.arange(n)
y = 3 * x + 2 + np.random.randn(n) * 0.5
coefficients = np.polyfit(x, y, 1)
This code fits a straight line to n points using np.polyfit().
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating sums and products of all
npoints to build matrices for fitting. - How many times: Each of the
npoints is processed once to compute these sums.
As the number of points n increases, the work grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations to sum values |
| 100 | About 100 operations to sum values |
| 1000 | About 1000 operations to sum values |
Pattern observation: Doubling the data roughly doubles the work needed.
Time Complexity: O(n)
This means the time to fit the line grows linearly with the number of data points.
[X] Wrong: "Fitting a line with np.polyfit() takes the same time no matter how many points there are."
[OK] Correct: The function must look at every point to calculate sums, so more points mean more work and longer time.
Knowing how the time grows with data size helps you explain performance in real projects and shows you understand how algorithms scale.
"What if we fit a polynomial of degree 3 instead of 1? How would the time complexity change?"