Least squares optimization in SciPy - Time & Space Complexity
We want to understand how the time needed to solve a least squares problem grows as the data size increases.
How does the number of calculations change when we have more data points or variables?
Analyze the time complexity of the following code snippet.
import numpy as np
from scipy.optimize import least_squares
def model(x, t):
return x[0] * np.exp(-x[1] * t)
t = np.linspace(0, 10, 100)
y = model([2.5, 1.3], t) + 0.1 * np.random.randn(100)
res = least_squares(lambda x: model(x, t) - y, x0=[1, 1])
This code fits a model to data by minimizing the difference between the model and observed points using least squares.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating residuals (differences) for all data points in each iteration.
- How many times: For each iteration, the residuals are computed over all data points; iterations repeat until convergence.
As the number of data points grows, the calculations for residuals increase proportionally each iteration.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 residual calculations per iteration |
| 100 | About 100 residual calculations per iteration |
| 1000 | About 1000 residual calculations per iteration |
Pattern observation: The work grows roughly in direct proportion to the number of data points.
Time Complexity: O(n * k)
This means the time grows linearly with the number of data points (n) and the number of iterations (k) needed to find the best fit.
[X] Wrong: "The time depends only on the number of data points, not on the number of iterations."
[OK] Correct: Each iteration requires recalculating residuals for all points, so more iterations multiply the total work.
Understanding how least squares optimization scales helps you explain performance when fitting models to data, a common task in data science and machine learning.
"What if the model had more parameters to estimate? How would the time complexity change?"