Least squares (least_squares) in SciPy - Time & Space Complexity
We want to understand how the time needed to solve a least squares problem grows as the problem size increases.
Specifically, how does the solver's work change when we have more data points or variables?
Analyze the time complexity of the following code snippet.
from scipy.optimize import least_squares
import numpy as np
def fun(x, A, b):
return A @ x - b
A = np.random.rand(1000, 10)
b = np.random.rand(1000)
x0 = np.zeros(10)
res = least_squares(fun, x0, args=(A, b))
This code solves a least squares problem to find x that best fits A x = b.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Multiplying matrix A by vector x repeatedly during optimization.
- How many times: This happens many times as the solver iterates to improve the solution.
As the number of rows (data points) or columns (variables) in A grows, the work to multiply and update grows too.
| Input Size (n rows) | Approx. Operations |
|---|---|
| 10 | Thousands |
| 100 | Hundreds of thousands |
| 1000 | Millions |
Pattern observation: The work grows roughly with the product of rows and columns, so bigger problems take much more time.
Time Complexity: O(k * m * n)
This means the time grows with the number of iterations k, the number of data points m, and the number of variables n.
[X] Wrong: "The solver runs in constant time no matter how big the data is."
[OK] Correct: The solver must process all data points and variables multiple times, so bigger problems take more time.
Understanding how least squares solvers scale helps you explain performance in real data fitting tasks.
This skill shows you can think about how algorithms behave with bigger data, a key part of data science work.
"What if we changed the solver to use a sparse matrix for A? How would the time complexity change?"