0
0
SciPydata~5 mins

Goodness of fit evaluation in SciPy - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Goodness of fit evaluation
O(n)
Understanding Time Complexity

We want to know how the time needed to check how well a model fits data changes as the data size grows.

How does the cost of evaluating goodness of fit grow when we have more data points?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import numpy as np
from scipy.stats import chisquare

observed = np.array([20, 30, 50])
expected = np.array([25, 25, 50])

chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)
print(f"Chi-square statistic: {chi2_stat}, p-value: {p_value}")

This code calculates the chi-square test statistic and p-value to see how well observed data matches expected data.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Element-wise subtraction, division, and squaring for each data point.
  • How many times: Once for each data point in the observed and expected arrays.
How Execution Grows With Input

Each data point requires a fixed number of calculations. More data points mean more calculations.

Input Size (n)Approx. Operations
10About 10 sets of calculations
100About 100 sets of calculations
1000About 1000 sets of calculations

Pattern observation: The number of operations grows directly with the number of data points.

Final Time Complexity

Time Complexity: O(n)

This means the time to evaluate goodness of fit grows linearly with the number of data points.

Common Mistake

[X] Wrong: "The test runs in constant time no matter how much data there is."

[OK] Correct: Each data point must be checked, so more data means more work and more time.

Interview Connect

Understanding how time grows with data size helps you explain your code choices clearly and shows you think about efficiency.

Self-Check

"What if we used a more complex goodness of fit test that compares pairs of data points? How would the time complexity change?"