Stationarity means a data pattern that does not change over time. Differencing helps make data stationary by removing trends or seasonality.
Stationarity and differencing in ML Python
differenced_data = original_data.diff(periods=1)diff() subtracts the previous value from the current value to remove trends.
The periods parameter controls how many steps back to subtract.
data.diff()
data.diff(periods=2)data.diff().dropna()
This code creates a time series with a trend, tests if it is stationary, then applies differencing to remove the trend and tests stationarity again. The p-value shows if the data is stationary (lower than 0.05 means stationary).
import pandas as pd import numpy as np from statsmodels.tsa.stattools import adfuller # Create a simple time series with a trend np.random.seed(0) time = pd.Series(np.arange(10)) data = time * 2 + np.random.normal(size=10) # Check if data is stationary using Augmented Dickey-Fuller test result_before = adfuller(data) # Apply differencing to remove trend diff_data = data.diff().dropna() # Check stationarity again result_after = adfuller(diff_data) print(f"ADF Statistic before differencing: {result_before[0]:.4f}") print(f"p-value before differencing: {result_before[1]:.4f}") print(f"ADF Statistic after differencing: {result_after[0]:.4f}") print(f"p-value after differencing: {result_after[1]:.4f}")
Differencing can create missing values at the start; always handle them (e.g., drop or fill).
Stationarity is important because many time series models assume stable data patterns.
Sometimes multiple differencing steps are needed to achieve stationarity.
Stationarity means data patterns stay consistent over time.
Differencing removes trends or seasonality to help make data stationary.
Testing stationarity before and after differencing helps prepare data for time series models.