Data Analysis Pythondata~5 mins

Linear regression basics in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Time Complexity: Linear regression basics

O(n d^2 + d^3)

Understanding Time Complexity

We want to understand how the time it takes to run linear regression changes as we get more data.

Specifically, how does the work grow when we add more rows or features?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import numpy as np

def linear_regression(X, y):
    X_b = np.c_[np.ones((len(X), 1)), X]  # add bias term
    theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
    return theta_best

This code fits a linear regression model by calculating the best parameters using the normal equation.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Matrix multiplication and matrix inversion.
How many times: These operations happen once but involve all data points and features.

How Execution Grows With Input

As the number of data points (rows) and features (columns) grow, the matrix operations take more time.

Input Size (n rows, d features)	Approx. Operations
10 rows, 2 features	Few hundred operations
100 rows, 5 features	Thousands of operations
1000 rows, 10 features	Hundreds of thousands of operations

Pattern observation: The time grows quickly with the number of features due to matrix inversion, and also grows with rows due to matrix multiplication.

Final Time Complexity

Time Complexity: O(n d^2 + d^3)

This means the time grows roughly with the number of data points times the square of features, plus the cube of features for inversion.

Common Mistake

[X] Wrong: "The time depends only on the number of data points (rows)."

[OK] Correct: The number of features (columns) also greatly affects time because matrix inversion depends on features, not just rows.

Interview Connect

Understanding how data size affects linear regression helps you explain model training time clearly and shows you know what happens behind the scenes.

Self-Check

"What if we used gradient descent instead of the normal equation? How would the time complexity change?"