NumPydata~5 mins

Linear regression with np.polyfit() in NumPy - Time & Space Complexity

Choose your learning style9 modes available

Time Complexity: Linear regression with np.polyfit()

O(n)

Understanding Time Complexity

We want to understand how the time it takes to run linear regression with np.polyfit() changes as the data size grows.

Specifically, how does the work increase when we add more points to fit?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import numpy as np

n = 100  # example size
x = np.arange(n)
y = 3 * x + 2 + np.random.randn(n) * 0.5

coefficients = np.polyfit(x, y, 1)

This code fits a straight line to n points using np.polyfit().

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Calculating sums and products of all n points to build matrices for fitting.
How many times: Each of the n points is processed once to compute these sums.

How Execution Grows With Input

As the number of points n increases, the work grows roughly in direct proportion.

Input Size (n)	Approx. Operations
10	About 10 operations to sum values
100	About 100 operations to sum values
1000	About 1000 operations to sum values

Pattern observation: Doubling the data roughly doubles the work needed.

Final Time Complexity

Time Complexity: O(n)

This means the time to fit the line grows linearly with the number of data points.

Common Mistake

[X] Wrong: "Fitting a line with np.polyfit() takes the same time no matter how many points there are."

[OK] Correct: The function must look at every point to calculate sums, so more points mean more work and longer time.

Interview Connect

Knowing how the time grows with data size helps you explain performance in real projects and shows you understand how algorithms scale.

Self-Check

"What if we fit a polynomial of degree 3 instead of 1? How would the time complexity change?"