0
0
ML Pythonml~8 mins

Stationarity and differencing in ML Python - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Stationarity and differencing
Which metric matters for Stationarity and differencing and WHY

For stationarity, the key metric is the Augmented Dickey-Fuller (ADF) test statistic and its p-value. This test tells us if a time series is stationary or not. A low p-value (usually below 0.05) means the series is stationary, which is important because many forecasting models assume stationarity.

For differencing, the metric is the order of differencing needed to achieve stationarity. We want to find the smallest number of differences that make the series stationary without losing important information.

Confusion matrix or equivalent visualization

Stationarity is not about classification, so no confusion matrix applies. Instead, we use the ADF test result table like this:

    +----------------------+----------------+
    | Statistic            | -3.45          |
    | p-value              | 0.01           |
    | Critical Values (5%)  | -2.86          |
    +----------------------+----------------+
    

If the test statistic is less than the critical value and p-value < 0.05, the series is stationary.

Precision vs Recall tradeoff (or equivalent)

Here, the tradeoff is between under-differencing and over-differencing.

  • Under-differencing: The series remains non-stationary. Models may give biased or poor forecasts because trends or seasonality remain.
  • Over-differencing: The series becomes too noisy and loses meaningful patterns, making forecasts unstable.

The goal is to find the just right differencing order that makes the series stationary but keeps useful information.

What "good" vs "bad" metric values look like for this use case

Good:

  • ADF test p-value < 0.05, indicating stationarity.
  • Differencing order is as low as possible (often 0 or 1).
  • Time series plots show stable mean and variance after differencing.

Bad:

  • ADF test p-value > 0.05, series is non-stationary.
  • High differencing order (2 or more) causing noisy data.
  • Time series plots show trends or changing variance after differencing.
Metrics pitfalls
  • Ignoring stationarity: Using models that assume stationarity on non-stationary data leads to bad forecasts.
  • Over-differencing: Differencing too many times can remove important signals and increase noise.
  • Misinterpreting ADF test: A p-value slightly above 0.05 does not always mean non-stationary; consider domain knowledge and plots.
  • Data leakage: Differencing using future data points can leak information and bias results.
Self-check question

Your time series model uses first-order differencing and the ADF test p-value is 0.07. Is your series stationary? Should you difference more?

Answer: Since p-value is 0.07 > 0.05, the series is likely still non-stationary. You may need to difference one more time or try other transformations. But also check plots and domain knowledge before deciding.

Key Result
Use the Augmented Dickey-Fuller test p-value to check stationarity; choose the smallest differencing order that achieves stationarity without over-differencing.