Bird
Raised Fist0
ML Pythonml~8 mins

Time series evaluation metrics in ML Python - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Time series evaluation metrics
Which metric matters for time series and WHY

In time series, we want to see how close our predictions are to actual values over time. Metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) tell us the average size of errors. MAE is simple and shows average error in the same units as data. RMSE gives more weight to big mistakes, so it helps catch big misses. Mean Absolute Percentage Error (MAPE) shows error as a percent, which helps compare across different scales. Choosing the right metric depends on what mistakes matter more in your case.

Confusion matrix or equivalent visualization

Time series problems usually predict numbers, not categories, so confusion matrix is not used. Instead, we look at error values over time. Here is an example of errors for 5 time points:

    Time:       1    2    3    4    5
    Actual:   100  150  130  170  160
    Predicted:  90  160  120  180  155
    Error:    10   10   10   10    5
    

We then calculate metrics like MAE = (10+10+10+10+5)/5 = 9, RMSE = sqrt((10²+10²+10²+10²+5²)/5) ≈ 9.22.

Precision vs Recall tradeoff (or equivalent)

In time series, we don't use precision or recall because those are for classification. Instead, we balance between metrics that treat errors differently. For example:

  • MAE treats all errors equally, good when all mistakes matter the same.
  • RMSE punishes big errors more, useful when big misses are costly.
  • MAPE shows error as a percent, helpful when scale changes over time.

Choosing depends on your goal: avoid big mistakes or keep average error low.

What "good" vs "bad" metric values look like

Good values mean small errors compared to the data size. For example, if your data values are around 100, an MAE of 2 means on average you miss by 2 units, which is good. An MAE of 50 means big errors, which is bad.

Similarly, RMSE should be close to MAE if errors are consistent. A much larger RMSE compared to MAE means some big mistakes.

MAPE below 10% is often good, meaning errors are less than 10% of actual values. Above 50% is usually bad.

Common pitfalls in time series metrics
  • Ignoring seasonality: Errors might look big if you don't consider repeating patterns.
  • Data leakage: Using future data to predict past can give unrealistically low errors.
  • Overfitting: Very low training error but high test error means model memorizes past but fails future.
  • Using MAPE with zeros: MAPE can be infinite or undefined if actual values are zero.
  • Not checking residuals: Errors should be random; patterns mean model misses something.
Self-check question

Your time series model has an MAE of 5 on training data but 30 on test data. Is it good?

Answer: No, this shows overfitting. The model predicts training data well but fails on new data. You should improve the model or get more data.

Key Result
MAE, RMSE, and MAPE are key metrics to measure average and weighted errors in time series predictions.

Practice

(1/5)
1. Which metric measures the average absolute difference between predicted and actual values in time series forecasting?
easy
A. Mean Squared Error (MSE)
B. Mean Absolute Error (MAE)
C. Root Mean Squared Error (RMSE)
D. R-squared (Coefficient of Determination)

Solution

  1. Step 1: Understand the definition of MAE

    MAE calculates the average of the absolute differences between predicted and actual values, showing average error size.
  2. Step 2: Compare with other metrics

    MSE and RMSE square errors, while R-squared measures variance explained, not average error.
  3. Final Answer:

    Mean Absolute Error (MAE) -> Option B
  4. Quick Check:

    Average absolute difference = MAE [OK]
Hint: MAE uses absolute differences, no squaring involved [OK]
Common Mistakes:
  • Confusing MAE with MSE or RMSE
  • Thinking R-squared measures error size
  • Assuming RMSE is the same as MAE
2. Which of the following is the correct formula for Root Mean Squared Error (RMSE) given errors \(e_i = y_i - \hat{y}_i\) for \(n\) points?
easy
A. RMSE = \(\sum_{i=1}^n e_i^2\)
B. RMSE = \(\frac{1}{n} \sum_{i=1}^n |e_i|\)
C. RMSE = \(\frac{1}{n} \sum_{i=1}^n e_i\)
D. RMSE = \(\sqrt{\frac{1}{n} \sum_{i=1}^n e_i^2}\)

Solution

  1. Step 1: Recall RMSE formula

    RMSE is the square root of the average of squared errors, so it must include squaring, averaging, then square root.
  2. Step 2: Check each option

    RMSE = \(\sqrt{\frac{1}{n} \sum_{i=1}^n e_i^2}\): \(\sqrt{\frac{1}{n} \sum_{i=1}^n e_i^2}\) matches the formula exactly. RMSE = \(\sum_{i=1}^n e_i^2\) misses averaging and root. RMSE = \(\frac{1}{n} \sum_{i=1}^n |e_i|\) is MAE. RMSE = \(\frac{1}{n} \sum_{i=1}^n e_i\) is mean error (not squared).
  3. Final Answer:

    RMSE = \(\sqrt{\frac{1}{n} \sum_{i=1}^n e_i^2}\) -> Option D
  4. Quick Check:

    RMSE = sqrt(mean squared errors) [OK]
Hint: RMSE = square root of average squared errors [OK]
Common Mistakes:
  • Forgetting to take square root
  • Using absolute errors instead of squared
  • Not dividing by number of points
3. Given actual values \([3, 5, 2, 7]\) and predicted values \([2, 5, 4, 8]\), what is the Mean Squared Error (MSE)?
medium
A. 1.5
B. 1.25
C. 2.0
D. 0.75

Solution

  1. Step 1: Calculate errors and square them

    Errors: 3-2=1, 5-5=0, 2-4=-2, 7-8=-1. Squared errors: 1, 0, 4, 1.
  2. Step 2: Compute average of squared errors

    Sum = 1+0+4+1=6. Average = 6/4 = 1.5.
  3. Final Answer:

    1.5 -> Option A
  4. Quick Check:

    Sum squared errors / count = 1.5 [OK]
Hint: Square errors, sum, then divide by count [OK]
Common Mistakes:
  • Using absolute errors instead of squared
  • Forgetting to average over all points
  • Mixing predicted and actual values
4. Identify the error in this Python code calculating MAE for time series predictions:
def mae(actual, predicted):
    errors = [a - p for a, p in zip(actual, predicted)]
    return sum(errors) / len(errors)
medium
A. Use multiplication instead of subtraction in errors
B. Divide by sum of errors instead of length
C. Errors should be absolute values before summing
D. No error, code is correct

Solution

  1. Step 1: Analyze error calculation

    The code calculates errors as differences but does not take absolute values, which MAE requires.
  2. Step 2: Understand MAE definition

    MAE is mean of absolute errors, so errors must be wrapped with abs() before summing.
  3. Final Answer:

    Errors should be absolute values before summing -> Option C
  4. Quick Check:

    MAE needs absolute errors [OK]
Hint: MAE sums absolute errors, not raw differences [OK]
Common Mistakes:
  • Skipping absolute value in error calculation
  • Dividing by wrong denominator
  • Confusing MAE with MSE
5. You have two forecasting models evaluated on the same dataset. Model A has MAE=2.5 and RMSE=3.0, Model B has MAE=2.0 and RMSE=3.5. Which model is generally better and why?
hard
A. Model A, because lower RMSE means fewer large errors
B. Model B, because higher RMSE indicates better fit
C. Model B, because lower MAE means better average error
D. Model A, because MAE and RMSE must be equal for best model

Solution

  1. Step 1: Interpret MAE and RMSE values

    Model B has lower MAE but higher RMSE, meaning it has better average error but more large errors. Model A has lower RMSE, indicating fewer large errors.
  2. Step 2: Decide which metric matters more

    RMSE penalizes large errors more, so lower RMSE often means more reliable predictions without big mistakes.
  3. Final Answer:

    Model A, because lower RMSE means fewer large errors -> Option A
  4. Quick Check:

    Lower RMSE means fewer big errors [OK]
Hint: Lower RMSE means fewer big errors; prefer it if large errors matter [OK]
Common Mistakes:
  • Choosing model with lower MAE ignoring RMSE
  • Thinking higher RMSE is better
  • Expecting MAE and RMSE to be equal