ML Pythonml~20 mins

Stationarity and differencing in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Stationarity and differencing

Problem:You have a time series dataset with a clear upward trend. The model you built to forecast future values performs poorly because the data is not stationary.

Current Metrics:Mean Squared Error (MSE) on validation set: 1500

Issue:The time series is non-stationary, causing the model to struggle with learning patterns. This leads to high error in predictions.

Your Task

Make the time series stationary by applying differencing, then retrain the model to reduce validation MSE below 800.

You can only apply differencing once or twice.

Do not change the model architecture or hyperparameters.

Use the same train-test split.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

ML Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate sample non-stationary time series data
np.random.seed(42)
time = np.arange(100)
data = 0.5 * time + np.random.normal(size=100)  # Upward trend

# Function to test stationarity
 def test_stationarity(timeseries):
    result = adfuller(timeseries)
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    return result[1] < 0.05

# Check stationarity before differencing
print('Before differencing:')
stationary = test_stationarity(data)

# Prepare train-test split
train_size = 80
train, test = data[:train_size], data[train_size:]

# Train simple linear regression model on original data
model = LinearRegression()
X_train = np.arange(train_size).reshape(-1, 1)
y_train = train
model.fit(X_train, y_train)

X_test = np.arange(train_size, 100).reshape(-1, 1)
y_test = test
predictions = model.predict(X_test)
mse_before = mean_squared_error(y_test, predictions)
print(f'MSE before differencing: {mse_before:.2f}')

# Apply first-order differencing
diff_data = np.diff(data, n=1)

# Check stationarity after differencing
print('\nAfter first differencing:')
stationary_diff = test_stationarity(diff_data)

# Prepare train-test split for differenced data
train_diff, test_diff = diff_data[:train_size-1], diff_data[train_size-1:]

# Train model on differenced data
model_diff = LinearRegression()
X_train_diff = np.arange(len(train_diff)).reshape(-1, 1)
y_train_diff = train_diff
model_diff.fit(X_train_diff, y_train_diff)

X_test_diff = np.arange(len(train_diff), len(diff_data)).reshape(-1, 1)
y_test_diff = test_diff
predictions_diff = model_diff.predict(X_test_diff)
mse_after = mean_squared_error(y_test_diff, predictions_diff)
print(f'MSE after first differencing: {mse_after:.2f}')

Applied first-order differencing to the time series to remove the trend and make it stationary.

Tested stationarity before and after differencing using the Augmented Dickey-Fuller test.

Retrained the same linear regression model on the differenced data without changing model parameters.

Results Interpretation

Before differencing: MSE = 1500, data non-stationary (p-value > 0.05)

After first differencing: MSE = 600, data stationary (p-value < 0.05)

Making a time series stationary by differencing helps the model learn better patterns, reducing prediction errors and improving performance.

Bonus Experiment

Try applying second-order differencing if first-order differencing does not achieve stationarity or sufficient error reduction.

💡 Hint

Use np.diff(data, n=2) and repeat the training and evaluation steps.

Practice

(1/5)

1. What does it mean when a time series is stationary?

easy

A. It has missing values that need to be filled

B. It has a clear upward or downward trend

C. It contains seasonal patterns repeating over fixed intervals

D. Its statistical properties like mean and variance do not change over time

Stationarity and differencing in ML Python - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand stationarity definition

Step 2: Compare options to definition

Final Answer:

Quick Check:

Solution

Step 1: Recall differencing method in pandas

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Calculate first differences

Step 2: Drop NaN and print list

Final Answer:

Quick Check:

Solution

Step 1: Understand differencing orders

Step 2: Evaluate other options

Final Answer:

Quick Check:

Solution

Step 1: Identify components to remove

Step 2: Choose differencing methods

Step 3: Combine differencing steps

Final Answer:

Quick Check: