0
0
ML Pythonml~5 mins

Stationarity and differencing in ML Python

Choose your learning style9 modes available
Introduction

Stationarity means a data pattern that does not change over time. Differencing helps make data stationary by removing trends or seasonality.

When you want to predict future sales based on past sales data that changes over time.
When analyzing temperature data that shows seasonal patterns.
When working with stock prices that have trends and fluctuations.
When preparing time series data for models that require stable patterns.
When you want to compare data points fairly over time without trend effects.
Syntax
ML Python
differenced_data = original_data.diff(periods=1)

diff() subtracts the previous value from the current value to remove trends.

The periods parameter controls how many steps back to subtract.

Examples
Subtracts the previous value from each value (lag 1) to remove simple trends.
ML Python
data.diff()
Subtracts the value two steps before to remove longer-term trends.
ML Python
data.diff(periods=2)
Removes the first missing value created by differencing to keep clean data.
ML Python
data.diff().dropna()
Sample Model

This code creates a time series with a trend, tests if it is stationary, then applies differencing to remove the trend and tests stationarity again. The p-value shows if the data is stationary (lower than 0.05 means stationary).

ML Python
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller

# Create a simple time series with a trend
np.random.seed(0)
time = pd.Series(np.arange(10))
data = time * 2 + np.random.normal(size=10)

# Check if data is stationary using Augmented Dickey-Fuller test
result_before = adfuller(data)

# Apply differencing to remove trend
diff_data = data.diff().dropna()

# Check stationarity again
result_after = adfuller(diff_data)

print(f"ADF Statistic before differencing: {result_before[0]:.4f}")
print(f"p-value before differencing: {result_before[1]:.4f}")
print(f"ADF Statistic after differencing: {result_after[0]:.4f}")
print(f"p-value after differencing: {result_after[1]:.4f}")
OutputSuccess
Important Notes

Differencing can create missing values at the start; always handle them (e.g., drop or fill).

Stationarity is important because many time series models assume stable data patterns.

Sometimes multiple differencing steps are needed to achieve stationarity.

Summary

Stationarity means data patterns stay consistent over time.

Differencing removes trends or seasonality to help make data stationary.

Testing stationarity before and after differencing helps prepare data for time series models.