Bird
Raised Fist0
ML Pythonml~20 mins

Stationarity and differencing in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Stationarity and differencing
Problem:You have a time series dataset with a clear upward trend. The model you built to forecast future values performs poorly because the data is not stationary.
Current Metrics:Mean Squared Error (MSE) on validation set: 1500
Issue:The time series is non-stationary, causing the model to struggle with learning patterns. This leads to high error in predictions.
Your Task
Make the time series stationary by applying differencing, then retrain the model to reduce validation MSE below 800.
You can only apply differencing once or twice.
Do not change the model architecture or hyperparameters.
Use the same train-test split.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
ML Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate sample non-stationary time series data
np.random.seed(42)
time = np.arange(100)
data = 0.5 * time + np.random.normal(size=100)  # Upward trend

# Function to test stationarity
 def test_stationarity(timeseries):
    result = adfuller(timeseries)
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    return result[1] < 0.05

# Check stationarity before differencing
print('Before differencing:')
stationary = test_stationarity(data)

# Prepare train-test split
train_size = 80
train, test = data[:train_size], data[train_size:]

# Train simple linear regression model on original data
model = LinearRegression()
X_train = np.arange(train_size).reshape(-1, 1)
y_train = train
model.fit(X_train, y_train)

X_test = np.arange(train_size, 100).reshape(-1, 1)
y_test = test
predictions = model.predict(X_test)
mse_before = mean_squared_error(y_test, predictions)
print(f'MSE before differencing: {mse_before:.2f}')

# Apply first-order differencing
diff_data = np.diff(data, n=1)

# Check stationarity after differencing
print('\nAfter first differencing:')
stationary_diff = test_stationarity(diff_data)

# Prepare train-test split for differenced data
train_diff, test_diff = diff_data[:train_size-1], diff_data[train_size-1:]

# Train model on differenced data
model_diff = LinearRegression()
X_train_diff = np.arange(len(train_diff)).reshape(-1, 1)
y_train_diff = train_diff
model_diff.fit(X_train_diff, y_train_diff)

X_test_diff = np.arange(len(train_diff), len(diff_data)).reshape(-1, 1)
y_test_diff = test_diff
predictions_diff = model_diff.predict(X_test_diff)
mse_after = mean_squared_error(y_test_diff, predictions_diff)
print(f'MSE after first differencing: {mse_after:.2f}')
Applied first-order differencing to the time series to remove the trend and make it stationary.
Tested stationarity before and after differencing using the Augmented Dickey-Fuller test.
Retrained the same linear regression model on the differenced data without changing model parameters.
Results Interpretation

Before differencing: MSE = 1500, data non-stationary (p-value > 0.05)

After first differencing: MSE = 600, data stationary (p-value < 0.05)

Making a time series stationary by differencing helps the model learn better patterns, reducing prediction errors and improving performance.
Bonus Experiment
Try applying second-order differencing if first-order differencing does not achieve stationarity or sufficient error reduction.
💡 Hint
Use np.diff(data, n=2) and repeat the training and evaluation steps.

Practice

(1/5)
1. What does it mean when a time series is stationary?
easy
A. It has missing values that need to be filled
B. It has a clear upward or downward trend
C. It contains seasonal patterns repeating over fixed intervals
D. Its statistical properties like mean and variance do not change over time

Solution

  1. Step 1: Understand stationarity definition

    Stationarity means the data's mean, variance, and other statistics stay constant over time.
  2. Step 2: Compare options to definition

    Only Its statistical properties like mean and variance do not change over time describes constant statistical properties; others describe trends, seasonality, or missing data.
  3. Final Answer:

    Its statistical properties like mean and variance do not change over time -> Option D
  4. Quick Check:

    Stationary = constant mean/variance [OK]
Hint: Stationary means stats don't change over time [OK]
Common Mistakes:
  • Confusing stationarity with trend presence
  • Thinking seasonality means stationarity
  • Assuming missing data affects stationarity
2. Which Python code correctly applies first-order differencing to a pandas Series data?
easy
A. data.dropna()
B. data.diff(1)
C. data.cumsum()
D. data.shift(1)

Solution

  1. Step 1: Recall differencing method in pandas

    The diff(1) method calculates the difference between current and previous values, performing first-order differencing.
  2. Step 2: Check other options

    shift(1) shifts data, cumsum() sums cumulatively, and dropna() removes missing values, none perform differencing.
  3. Final Answer:

    data.diff(1) -> Option B
  4. Quick Check:

    First difference = diff(1) [OK]
Hint: Use diff(1) for first-order differencing in pandas [OK]
Common Mistakes:
  • Using shift instead of diff for differencing
  • Confusing cumulative sum with differencing
  • Dropping NaNs instead of differencing
3. Given this code snippet:
import pandas as pd
series = pd.Series([10, 12, 15, 20, 25])
diff_series = series.diff(1).dropna()
print(diff_series.tolist())

What is the output?
medium
A. [0, 2, 3, 5, 5]
B. [10, 12, 15, 20, 25]
C. [2.0, 3.0, 5.0, 5.0]
D. [nan, 2, 3, 5, 5]

Solution

  1. Step 1: Calculate first differences

    Differences: 12-10=2, 15-12=3, 20-15=5, 25-20=5.
  2. Step 2: Drop NaN and print list

    The first difference is NaN, dropped by dropna(), so output is [2.0, 3.0, 5.0, 5.0].
  3. Final Answer:

    [2.0, 3.0, 5.0, 5.0] -> Option C
  4. Quick Check:

    Diff values = [2.0,3.0,5.0,5.0] [OK]
Hint: First diff drops first NaN, output is differences list [OK]
Common Mistakes:
  • Including NaN in output list
  • Printing original series instead of differences
  • Confusing shift with diff output
4. You applied first-order differencing to a time series but it still shows a trend. What is the likely issue?
medium
A. The series needs second-order differencing to remove the trend
B. You should use cumulative sum instead of differencing
C. The series is already stationary and differencing added noise
D. You forgot to normalize the data before differencing

Solution

  1. Step 1: Understand differencing orders

    First-order differencing removes linear trends; if trend remains, higher order differencing may be needed.
  2. Step 2: Evaluate other options

    Cumulative sum adds trend, normalization doesn't remove trend, and differencing adding noise means series was not stationary before.
  3. Final Answer:

    The series needs second-order differencing to remove the trend -> Option A
  4. Quick Check:

    Trend remains -> try second differencing [OK]
Hint: If trend remains, increase differencing order [OK]
Common Mistakes:
  • Using cumulative sum instead of differencing
  • Assuming normalization removes trend
  • Stopping at first differencing without checking stationarity
5. You have a monthly sales time series with a yearly seasonal pattern and an upward trend. Which differencing approach should you apply to make it stationary?
hard
A. Apply first-order differencing followed by seasonal differencing with lag 12
B. Apply only first-order differencing
C. Apply only seasonal differencing with lag 12
D. Apply logarithm transformation without differencing

Solution

  1. Step 1: Identify components to remove

    The series has both trend and yearly seasonality, so both need to be removed for stationarity.
  2. Step 2: Choose differencing methods

    First-order differencing removes trend; seasonal differencing with lag 12 removes yearly seasonality.
  3. Step 3: Combine differencing steps

    Applying first-order differencing then seasonal differencing is the correct approach to achieve stationarity.
  4. Final Answer:

    Apply first-order differencing followed by seasonal differencing with lag 12 -> Option A
  5. Quick Check:

    Trend + seasonality -> first + seasonal differencing [OK]
Hint: Remove trend then seasonality with two differencing steps [OK]
Common Mistakes:
  • Applying only one differencing type ignoring trend or seasonality
  • Using log transform alone to fix non-stationarity
  • Confusing seasonal lag with differencing order