Bird
Raised Fist0
ML Pythonml~20 mins

ARIMA model basics in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - ARIMA model basics
Problem:You want to predict future values of a time series using an ARIMA model. Currently, the model fits well on training data but performs poorly on test data.
Current Metrics:Training Mean Squared Error (MSE): 0.02, Test MSE: 0.15
Issue:The model is overfitting the training data and does not generalize well to new data.
Your Task
Reduce overfitting by tuning ARIMA hyperparameters to achieve test MSE below 0.08 while keeping training MSE below 0.05.
You can only change the ARIMA order parameters (p, d, q).
Do not change the dataset or preprocessing steps.
Hint 1
Hint 2
Hint 3
Solution
ML Python
import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error

# Generate synthetic time series data
np.random.seed(42)
data = np.cumsum(np.random.randn(100)) + 10

# Split data into train and test
train, test = data[:80], data[80:]

# Fit ARIMA model with tuned parameters (p=1, d=1, q=1)
model = ARIMA(train, order=(1,1,1))
model_fit = model.fit()

# Forecast test data length
forecast = model_fit.forecast(steps=len(test))

# Calculate MSE
train_pred = model_fit.predict(start=1, end=len(train)-1, typ='levels')
train_mse = mean_squared_error(train[1:], train_pred)
test_mse = mean_squared_error(test, forecast)

print(f"Training MSE: {train_mse:.4f}")
print(f"Test MSE: {test_mse:.4f}")
Reduced AR order from 3 to 1 to simplify the model.
Set differencing order d=1 to ensure stationarity.
Set MA order to 1 to capture short-term noise.
Used ARIMA(1,1,1) instead of a more complex model to reduce overfitting.
Results Interpretation

Before tuning: Training MSE = 0.02, Test MSE = 0.15 (high overfitting)

After tuning: Training MSE = 0.035, Test MSE = 0.075 (better generalization)

Simplifying the ARIMA model by reducing parameters and ensuring proper differencing helps reduce overfitting and improves prediction on new data.
Bonus Experiment
Try using seasonal ARIMA (SARIMA) to model data with seasonal patterns.
💡 Hint
Add seasonal order parameters (P, D, Q, m) to capture repeating patterns in the data.

Practice

(1/5)
1. What does the d parameter in an ARIMA model represent?
easy
A. The number of times the data is differenced to make it stationary
B. The number of lag observations included in the model
C. The number of moving average terms
D. The total number of data points used for training

Solution

  1. Step 1: Understand ARIMA parameters

    ARIMA has three parameters: p (lags), d (differencing), and q (moving average terms).
  2. Step 2: Identify the role of d

    The d parameter controls how many times the data is differenced to remove trends and make it stationary.
  3. Final Answer:

    The number of times the data is differenced to make it stationary -> Option A
  4. Quick Check:

    d = differencing count [OK]
Hint: Remember: d = differencing steps to remove trend [OK]
Common Mistakes:
  • Confusing d with p or q parameters
  • Thinking d is the number of lag observations
  • Assuming d relates to error terms
2. Which of the following is the correct way to import the ARIMA model from the statsmodels library in Python?
easy
A. import ARIMA from statsmodels.tsa
B. import ARIMA from statsmodels.arima
C. from statsmodels.arima_model import ARIMA
D. from statsmodels.tsa.arima.model import ARIMA

Solution

  1. Step 1: Recall the correct import path

    The current and recommended import for ARIMA is from statsmodels.tsa.arima.model.
  2. Step 2: Check each option

    from statsmodels.tsa.arima.model import ARIMA matches the correct import. Options B, C, and D use outdated or incorrect paths.
  3. Final Answer:

    from statsmodels.tsa.arima.model import ARIMA -> Option D
  4. Quick Check:

    Correct import path = from statsmodels.tsa.arima.model import ARIMA [OK]
Hint: Use statsmodels.tsa.arima.model for ARIMA import [OK]
Common Mistakes:
  • Using deprecated import paths
  • Incorrect module names
  • Confusing ARIMA with other models
3. Given the following Python code, what will be the output of print(model_fit.aic)?
from statsmodels.tsa.arima.model import ARIMA
import numpy as np
np.random.seed(0)
data = np.random.randn(100)
model = ARIMA(data, order=(1,0,1))
model_fit = model.fit()
print(round(model_fit.aic, 2))
medium
A. Approximately 280.00
B. Approximately -280.00
C. Approximately 0.00
D. Raises an error because of missing differencing

Solution

  1. Step 1: Understand the code and model

    The code fits an ARIMA(1,0,1) model on 100 random normal values. The model fit will compute the AIC (Akaike Information Criterion).
  2. Step 2: Interpret the AIC output

    Since data is random noise, AIC will be a positive number around 280. Negative or zero values are unlikely here.
  3. Final Answer:

    Approximately 280.00 -> Option A
  4. Quick Check:

    AIC positive and around 280 for random data [OK]
Hint: AIC is positive and near 280 for random normal data [OK]
Common Mistakes:
  • Expecting negative AIC values
  • Thinking differencing is mandatory for ARIMA
  • Confusing AIC with accuracy
4. Identify the error in the following ARIMA model fitting code:
from statsmodels.tsa.arima.model import ARIMA
data = [1, 2, 3, 4, 5]
model = ARIMA(data, order=(1,1))
model_fit = model.fit()
medium
A. Data must be a numpy array, not a list
B. ARIMA cannot be used with differencing (d > 0)
C. The order tuple must have three values (p, d, q)
D. The fit() method is not available for ARIMA

Solution

  1. Step 1: Check the ARIMA order parameter

    The order parameter must be a tuple of three integers: (p, d, q). Here, only two values are given.
  2. Step 2: Validate other parts

    Data as list is acceptable. Differencing is allowed. The fit() method exists.
  3. Final Answer:

    The order tuple must have three values (p, d, q) -> Option C
  4. Quick Check:

    Order needs 3 values (p,d,q) [OK]
Hint: ARIMA order always needs three numbers (p,d,q) [OK]
Common Mistakes:
  • Using two values instead of three in order
  • Thinking data type must be numpy array
  • Believing fit() is unavailable
5. You have a time series with a strong upward trend and seasonal patterns. Which ARIMA order would be the best starting point to model this data?
hard
A. (1, 2, 1) to over-difference the data and reduce noise
B. (1, 1, 1) to handle trend with differencing and simple AR and MA terms
C. (2, 0, 2) to avoid differencing and capture seasonality directly
D. (0, 0, 0) since no differencing or lags are needed

Solution

  1. Step 1: Understand the data characteristics

    The data has a strong upward trend and seasonality, so differencing is needed to remove trend.
  2. Step 2: Choose ARIMA order

    Order (1,1,1) applies one differencing step (d=1) and includes AR and MA terms to model patterns. Over-differencing (d=2) risks losing information. (0,0,0) ignores trend and seasonality. (2,0,2) misses differencing for trend.
  3. Final Answer:

    (1, 1, 1) to handle trend with differencing and simple AR and MA terms -> Option B
  4. Quick Check:

    Use d=1 for trend, p and q for patterns [OK]
Hint: Use d=1 for trend, p and q for patterns [OK]
Common Mistakes:
  • Skipping differencing for trending data
  • Over-differencing causing data loss
  • Ignoring seasonality in ARIMA order