Stationarity means a data pattern that does not change over time. Differencing helps make data stationary by removing trends or seasonality.
Stationarity and differencing in ML Python
Start learning this pattern below
Jump into concepts and practice - no test required
differenced_data = original_data.diff(periods=1)diff() subtracts the previous value from the current value to remove trends.
The periods parameter controls how many steps back to subtract.
data.diff()
data.diff(periods=2)data.diff().dropna()
This code creates a time series with a trend, tests if it is stationary, then applies differencing to remove the trend and tests stationarity again. The p-value shows if the data is stationary (lower than 0.05 means stationary).
import pandas as pd import numpy as np from statsmodels.tsa.stattools import adfuller # Create a simple time series with a trend np.random.seed(0) time = pd.Series(np.arange(10)) data = time * 2 + np.random.normal(size=10) # Check if data is stationary using Augmented Dickey-Fuller test result_before = adfuller(data) # Apply differencing to remove trend diff_data = data.diff().dropna() # Check stationarity again result_after = adfuller(diff_data) print(f"ADF Statistic before differencing: {result_before[0]:.4f}") print(f"p-value before differencing: {result_before[1]:.4f}") print(f"ADF Statistic after differencing: {result_after[0]:.4f}") print(f"p-value after differencing: {result_after[1]:.4f}")
Differencing can create missing values at the start; always handle them (e.g., drop or fill).
Stationarity is important because many time series models assume stable data patterns.
Sometimes multiple differencing steps are needed to achieve stationarity.
Stationarity means data patterns stay consistent over time.
Differencing removes trends or seasonality to help make data stationary.
Testing stationarity before and after differencing helps prepare data for time series models.
Practice
stationary?Solution
Step 1: Understand stationarity definition
Stationarity means the data's mean, variance, and other statistics stay constant over time.Step 2: Compare options to definition
Only Its statistical properties like mean and variance do not change over time describes constant statistical properties; others describe trends, seasonality, or missing data.Final Answer:
Its statistical properties like mean and variance do not change over time -> Option DQuick Check:
Stationary = constant mean/variance [OK]
- Confusing stationarity with trend presence
- Thinking seasonality means stationarity
- Assuming missing data affects stationarity
data?Solution
Step 1: Recall differencing method in pandas
Thediff(1)method calculates the difference between current and previous values, performing first-order differencing.Step 2: Check other options
shift(1)shifts data,cumsum()sums cumulatively, anddropna()removes missing values, none perform differencing.Final Answer:
data.diff(1) -> Option BQuick Check:
First difference = diff(1) [OK]
- Using shift instead of diff for differencing
- Confusing cumulative sum with differencing
- Dropping NaNs instead of differencing
import pandas as pd series = pd.Series([10, 12, 15, 20, 25]) diff_series = series.diff(1).dropna() print(diff_series.tolist())
What is the output?
Solution
Step 1: Calculate first differences
Differences: 12-10=2, 15-12=3, 20-15=5, 25-20=5.Step 2: Drop NaN and print list
The first difference is NaN, dropped bydropna(), so output is [2.0, 3.0, 5.0, 5.0].Final Answer:
[2.0, 3.0, 5.0, 5.0] -> Option CQuick Check:
Diff values = [2.0,3.0,5.0,5.0] [OK]
- Including NaN in output list
- Printing original series instead of differences
- Confusing shift with diff output
Solution
Step 1: Understand differencing orders
First-order differencing removes linear trends; if trend remains, higher order differencing may be needed.Step 2: Evaluate other options
Cumulative sum adds trend, normalization doesn't remove trend, and differencing adding noise means series was not stationary before.Final Answer:
The series needs second-order differencing to remove the trend -> Option AQuick Check:
Trend remains -> try second differencing [OK]
- Using cumulative sum instead of differencing
- Assuming normalization removes trend
- Stopping at first differencing without checking stationarity
Solution
Step 1: Identify components to remove
The series has both trend and yearly seasonality, so both need to be removed for stationarity.Step 2: Choose differencing methods
First-order differencing removes trend; seasonal differencing with lag 12 removes yearly seasonality.Step 3: Combine differencing steps
Applying first-order differencing then seasonal differencing is the correct approach to achieve stationarity.Final Answer:
Apply first-order differencing followed by seasonal differencing with lag 12 -> Option AQuick Check:
Trend + seasonality -> first + seasonal differencing [OK]
- Applying only one differencing type ignoring trend or seasonality
- Using log transform alone to fix non-stationarity
- Confusing seasonal lag with differencing order
