0
0
ML Pythonml~8 mins

Time series evaluation metrics in ML Python - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Time series evaluation metrics
Which metric matters for time series and WHY

In time series, we want to see how close our predictions are to actual values over time. Metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) tell us the average size of errors. MAE is simple and shows average error in the same units as data. RMSE gives more weight to big mistakes, so it helps catch big misses. Mean Absolute Percentage Error (MAPE) shows error as a percent, which helps compare across different scales. Choosing the right metric depends on what mistakes matter more in your case.

Confusion matrix or equivalent visualization

Time series problems usually predict numbers, not categories, so confusion matrix is not used. Instead, we look at error values over time. Here is an example of errors for 5 time points:

    Time:       1    2    3    4    5
    Actual:   100  150  130  170  160
    Predicted:  90  160  120  180  155
    Error:    10   10   10   10    5
    

We then calculate metrics like MAE = (10+10+10+10+5)/5 = 9, RMSE = sqrt((10²+10²+10²+10²+5²)/5) ≈ 9.22.

Precision vs Recall tradeoff (or equivalent)

In time series, we don't use precision or recall because those are for classification. Instead, we balance between metrics that treat errors differently. For example:

  • MAE treats all errors equally, good when all mistakes matter the same.
  • RMSE punishes big errors more, useful when big misses are costly.
  • MAPE shows error as a percent, helpful when scale changes over time.

Choosing depends on your goal: avoid big mistakes or keep average error low.

What "good" vs "bad" metric values look like

Good values mean small errors compared to the data size. For example, if your data values are around 100, an MAE of 2 means on average you miss by 2 units, which is good. An MAE of 50 means big errors, which is bad.

Similarly, RMSE should be close to MAE if errors are consistent. A much larger RMSE compared to MAE means some big mistakes.

MAPE below 10% is often good, meaning errors are less than 10% of actual values. Above 50% is usually bad.

Common pitfalls in time series metrics
  • Ignoring seasonality: Errors might look big if you don't consider repeating patterns.
  • Data leakage: Using future data to predict past can give unrealistically low errors.
  • Overfitting: Very low training error but high test error means model memorizes past but fails future.
  • Using MAPE with zeros: MAPE can be infinite or undefined if actual values are zero.
  • Not checking residuals: Errors should be random; patterns mean model misses something.
Self-check question

Your time series model has an MAE of 5 on training data but 30 on test data. Is it good?

Answer: No, this shows overfitting. The model predicts training data well but fails on new data. You should improve the model or get more data.

Key Result
MAE, RMSE, and MAPE are key metrics to measure average and weighted errors in time series predictions.