In time series, we want to see how close our predictions are to actual values over time. Metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) tell us the average size of errors. MAE is simple and shows average error in the same units as data. RMSE gives more weight to big mistakes, so it helps catch big misses. Mean Absolute Percentage Error (MAPE) shows error as a percent, which helps compare across different scales. Choosing the right metric depends on what mistakes matter more in your case.
Time series evaluation metrics in ML Python - Model Metrics & Evaluation
Time series problems usually predict numbers, not categories, so confusion matrix is not used. Instead, we look at error values over time. Here is an example of errors for 5 time points:
Time: 1 2 3 4 5
Actual: 100 150 130 170 160
Predicted: 90 160 120 180 155
Error: 10 10 10 10 5
We then calculate metrics like MAE = (10+10+10+10+5)/5 = 9, RMSE = sqrt((10²+10²+10²+10²+5²)/5) ≈ 9.22.
In time series, we don't use precision or recall because those are for classification. Instead, we balance between metrics that treat errors differently. For example:
- MAE treats all errors equally, good when all mistakes matter the same.
- RMSE punishes big errors more, useful when big misses are costly.
- MAPE shows error as a percent, helpful when scale changes over time.
Choosing depends on your goal: avoid big mistakes or keep average error low.
Good values mean small errors compared to the data size. For example, if your data values are around 100, an MAE of 2 means on average you miss by 2 units, which is good. An MAE of 50 means big errors, which is bad.
Similarly, RMSE should be close to MAE if errors are consistent. A much larger RMSE compared to MAE means some big mistakes.
MAPE below 10% is often good, meaning errors are less than 10% of actual values. Above 50% is usually bad.
- Ignoring seasonality: Errors might look big if you don't consider repeating patterns.
- Data leakage: Using future data to predict past can give unrealistically low errors.
- Overfitting: Very low training error but high test error means model memorizes past but fails future.
- Using MAPE with zeros: MAPE can be infinite or undefined if actual values are zero.
- Not checking residuals: Errors should be random; patterns mean model misses something.
Your time series model has an MAE of 5 on training data but 30 on test data. Is it good?
Answer: No, this shows overfitting. The model predicts training data well but fails on new data. You should improve the model or get more data.