In time series, we want to see how close our predictions are to actual values over time. Metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) tell us the average size of errors. MAE is simple and shows average error in the same units as data. RMSE gives more weight to big mistakes, so it helps catch big misses. Mean Absolute Percentage Error (MAPE) shows error as a percent, which helps compare across different scales. Choosing the right metric depends on what mistakes matter more in your case.
Time series evaluation metrics in ML Python - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Time series problems usually predict numbers, not categories, so confusion matrix is not used. Instead, we look at error values over time. Here is an example of errors for 5 time points:
Time: 1 2 3 4 5
Actual: 100 150 130 170 160
Predicted: 90 160 120 180 155
Error: 10 10 10 10 5
We then calculate metrics like MAE = (10+10+10+10+5)/5 = 9, RMSE = sqrt((10²+10²+10²+10²+5²)/5) ≈ 9.22.
In time series, we don't use precision or recall because those are for classification. Instead, we balance between metrics that treat errors differently. For example:
- MAE treats all errors equally, good when all mistakes matter the same.
- RMSE punishes big errors more, useful when big misses are costly.
- MAPE shows error as a percent, helpful when scale changes over time.
Choosing depends on your goal: avoid big mistakes or keep average error low.
Good values mean small errors compared to the data size. For example, if your data values are around 100, an MAE of 2 means on average you miss by 2 units, which is good. An MAE of 50 means big errors, which is bad.
Similarly, RMSE should be close to MAE if errors are consistent. A much larger RMSE compared to MAE means some big mistakes.
MAPE below 10% is often good, meaning errors are less than 10% of actual values. Above 50% is usually bad.
- Ignoring seasonality: Errors might look big if you don't consider repeating patterns.
- Data leakage: Using future data to predict past can give unrealistically low errors.
- Overfitting: Very low training error but high test error means model memorizes past but fails future.
- Using MAPE with zeros: MAPE can be infinite or undefined if actual values are zero.
- Not checking residuals: Errors should be random; patterns mean model misses something.
Your time series model has an MAE of 5 on training data but 30 on test data. Is it good?
Answer: No, this shows overfitting. The model predicts training data well but fails on new data. You should improve the model or get more data.
Practice
Solution
Step 1: Understand the definition of MAE
MAE calculates the average of the absolute differences between predicted and actual values, showing average error size.Step 2: Compare with other metrics
MSE and RMSE square errors, while R-squared measures variance explained, not average error.Final Answer:
Mean Absolute Error (MAE) -> Option BQuick Check:
Average absolute difference = MAE [OK]
- Confusing MAE with MSE or RMSE
- Thinking R-squared measures error size
- Assuming RMSE is the same as MAE
Solution
Step 1: Recall RMSE formula
RMSE is the square root of the average of squared errors, so it must include squaring, averaging, then square root.Step 2: Check each option
RMSE = \(\sqrt{\frac{1}{n} \sum_{i=1}^n e_i^2}\): \(\sqrt{\frac{1}{n} \sum_{i=1}^n e_i^2}\) matches the formula exactly. RMSE = \(\sum_{i=1}^n e_i^2\) misses averaging and root. RMSE = \(\frac{1}{n} \sum_{i=1}^n |e_i|\) is MAE. RMSE = \(\frac{1}{n} \sum_{i=1}^n e_i\) is mean error (not squared).Final Answer:
RMSE = \(\sqrt{\frac{1}{n} \sum_{i=1}^n e_i^2}\) -> Option DQuick Check:
RMSE = sqrt(mean squared errors) [OK]
- Forgetting to take square root
- Using absolute errors instead of squared
- Not dividing by number of points
Solution
Step 1: Calculate errors and square them
Errors: 3-2=1, 5-5=0, 2-4=-2, 7-8=-1. Squared errors: 1, 0, 4, 1.Step 2: Compute average of squared errors
Sum = 1+0+4+1=6. Average = 6/4 = 1.5.Final Answer:
1.5 -> Option AQuick Check:
Sum squared errors / count = 1.5 [OK]
- Using absolute errors instead of squared
- Forgetting to average over all points
- Mixing predicted and actual values
def mae(actual, predicted):
errors = [a - p for a, p in zip(actual, predicted)]
return sum(errors) / len(errors)Solution
Step 1: Analyze error calculation
The code calculates errors as differences but does not take absolute values, which MAE requires.Step 2: Understand MAE definition
MAE is mean of absolute errors, so errors must be wrapped with abs() before summing.Final Answer:
Errors should be absolute values before summing -> Option CQuick Check:
MAE needs absolute errors [OK]
- Skipping absolute value in error calculation
- Dividing by wrong denominator
- Confusing MAE with MSE
Solution
Step 1: Interpret MAE and RMSE values
Model B has lower MAE but higher RMSE, meaning it has better average error but more large errors. Model A has lower RMSE, indicating fewer large errors.Step 2: Decide which metric matters more
RMSE penalizes large errors more, so lower RMSE often means more reliable predictions without big mistakes.Final Answer:
Model A, because lower RMSE means fewer large errors -> Option AQuick Check:
Lower RMSE means fewer big errors [OK]
- Choosing model with lower MAE ignoring RMSE
- Thinking higher RMSE is better
- Expecting MAE and RMSE to be equal
