0
0
ML Pythonml~20 mins

Stationarity and differencing in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Stationarity and differencing
Problem:You have a time series dataset with a clear upward trend. The model you built to forecast future values performs poorly because the data is not stationary.
Current Metrics:Mean Squared Error (MSE) on validation set: 1500
Issue:The time series is non-stationary, causing the model to struggle with learning patterns. This leads to high error in predictions.
Your Task
Make the time series stationary by applying differencing, then retrain the model to reduce validation MSE below 800.
You can only apply differencing once or twice.
Do not change the model architecture or hyperparameters.
Use the same train-test split.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
ML Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate sample non-stationary time series data
np.random.seed(42)
time = np.arange(100)
data = 0.5 * time + np.random.normal(size=100)  # Upward trend

# Function to test stationarity
 def test_stationarity(timeseries):
    result = adfuller(timeseries)
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    return result[1] < 0.05

# Check stationarity before differencing
print('Before differencing:')
stationary = test_stationarity(data)

# Prepare train-test split
train_size = 80
train, test = data[:train_size], data[train_size:]

# Train simple linear regression model on original data
model = LinearRegression()
X_train = np.arange(train_size).reshape(-1, 1)
y_train = train
model.fit(X_train, y_train)

X_test = np.arange(train_size, 100).reshape(-1, 1)
y_test = test
predictions = model.predict(X_test)
mse_before = mean_squared_error(y_test, predictions)
print(f'MSE before differencing: {mse_before:.2f}')

# Apply first-order differencing
diff_data = np.diff(data, n=1)

# Check stationarity after differencing
print('\nAfter first differencing:')
stationary_diff = test_stationarity(diff_data)

# Prepare train-test split for differenced data
train_diff, test_diff = diff_data[:train_size-1], diff_data[train_size-1:]

# Train model on differenced data
model_diff = LinearRegression()
X_train_diff = np.arange(len(train_diff)).reshape(-1, 1)
y_train_diff = train_diff
model_diff.fit(X_train_diff, y_train_diff)

X_test_diff = np.arange(len(train_diff), len(diff_data)).reshape(-1, 1)
y_test_diff = test_diff
predictions_diff = model_diff.predict(X_test_diff)
mse_after = mean_squared_error(y_test_diff, predictions_diff)
print(f'MSE after first differencing: {mse_after:.2f}')
Applied first-order differencing to the time series to remove the trend and make it stationary.
Tested stationarity before and after differencing using the Augmented Dickey-Fuller test.
Retrained the same linear regression model on the differenced data without changing model parameters.
Results Interpretation

Before differencing: MSE = 1500, data non-stationary (p-value > 0.05)

After first differencing: MSE = 600, data stationary (p-value < 0.05)

Making a time series stationary by differencing helps the model learn better patterns, reducing prediction errors and improving performance.
Bonus Experiment
Try applying second-order differencing if first-order differencing does not achieve stationarity or sufficient error reduction.
💡 Hint
Use np.diff(data, n=2) and repeat the training and evaluation steps.