Linear regression basics in Data Analysis Python - Time & Space Complexity
We want to understand how the time it takes to run linear regression changes as we get more data.
Specifically, how does the work grow when we add more rows or features?
Analyze the time complexity of the following code snippet.
import numpy as np
def linear_regression(X, y):
X_b = np.c_[np.ones((len(X), 1)), X] # add bias term
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
return theta_best
This code fits a linear regression model by calculating the best parameters using the normal equation.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Matrix multiplication and matrix inversion.
- How many times: These operations happen once but involve all data points and features.
As the number of data points (rows) and features (columns) grow, the matrix operations take more time.
| Input Size (n rows, d features) | Approx. Operations |
|---|---|
| 10 rows, 2 features | Few hundred operations |
| 100 rows, 5 features | Thousands of operations |
| 1000 rows, 10 features | Hundreds of thousands of operations |
Pattern observation: The time grows quickly with the number of features due to matrix inversion, and also grows with rows due to matrix multiplication.
Time Complexity: O(n d^2 + d^3)
This means the time grows roughly with the number of data points times the square of features, plus the cube of features for inversion.
[X] Wrong: "The time depends only on the number of data points (rows)."
[OK] Correct: The number of features (columns) also greatly affects time because matrix inversion depends on features, not just rows.
Understanding how data size affects linear regression helps you explain model training time clearly and shows you know what happens behind the scenes.
"What if we used gradient descent instead of the normal equation? How would the time complexity change?"