Time Series Forecasting in Python with sklearn: Simple Guide
To do time series forecasting in Python with
sklearn, you first prepare your data with past values as features and future values as targets, then train a regression model like RandomForestRegressor. Finally, use the model to predict future points based on recent data.Syntax
Time series forecasting with sklearn involves these steps:
- Prepare features: Use past time points as input features.
- Prepare target: Use the next time point as the target to predict.
- Train model: Fit a regression model like
RandomForestRegressoron the features and target. - Predict: Use the trained model to forecast future values.
python
from sklearn.ensemble import RandomForestRegressor import numpy as np # Example data: time series values series = np.array([10, 12, 13, 15, 16, 18, 20]) # Prepare features and target X = [] # past values y = [] # next value window_size = 3 for i in range(len(series) - window_size): X.append(series[i:i+window_size]) y.append(series[i+window_size]) X = np.array(X) y = np.array(y) # Train model model = RandomForestRegressor() model.fit(X, y) # Predict next value last_window = series[-window_size:].reshape(1, -1) prediction = model.predict(last_window) print(prediction)
Output
[21.4]
Example
This example shows how to forecast the next value in a simple time series using RandomForestRegressor from sklearn. It uses a sliding window of past 3 values to predict the next one.
python
from sklearn.ensemble import RandomForestRegressor import numpy as np # Sample time series data series = np.array([100, 102, 101, 105, 110, 108, 115, 120]) # Create features and target using window size 3 window_size = 3 X, y = [], [] for i in range(len(series) - window_size): X.append(series[i:i+window_size]) y.append(series[i+window_size]) X = np.array(X) y = np.array(y) # Initialize and train the model model = RandomForestRegressor(random_state=42) model.fit(X, y) # Predict the next value after the last window last_window = series[-window_size:].reshape(1, -1) predicted_value = model.predict(last_window) print(f"Predicted next value: {predicted_value[0]:.2f}")
Output
Predicted next value: 121.25
Common Pitfalls
Common mistakes in time series forecasting with sklearn include:
- Not using a sliding window to create features, which means the model has no context of past values.
- Mixing training and test data without respecting time order, causing data leakage.
- Ignoring the need to scale or transform data if required by the model.
- Using classification models instead of regression for forecasting numeric values.
Always split data by time (train on earlier points, test on later points) and create features that represent past values.
python
import numpy as np from sklearn.ensemble import RandomForestRegressor # Wrong: Using raw series as features without windowing series = np.array([1, 2, 3, 4, 5, 6, 7]) X_wrong = series[:-1].reshape(-1, 1) # single values, no past context y_wrong = series[1:] model_wrong = RandomForestRegressor() model_wrong.fit(X_wrong, y_wrong) # Right: Using sliding window window_size = 3 X_right, y_right = [], [] for i in range(len(series) - window_size): X_right.append(series[i:i+window_size]) y_right.append(series[i+window_size]) X_right = np.array(X_right) y_right = np.array(y_right) model_right = RandomForestRegressor() model_right.fit(X_right, y_right)
Quick Reference
Tips for time series forecasting with sklearn:
- Use a sliding window to create features from past time points.
- Choose regression models like
RandomForestRegressor,LinearRegression, orGradientBoostingRegressor. - Split data by time to avoid data leakage.
- Evaluate with metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE).
- Scale or transform data if model requires it.
Key Takeaways
Create features using a sliding window of past values to predict the next time point.
Use regression models from sklearn like RandomForestRegressor for forecasting numeric series.
Always split data by time order to prevent data leakage between training and testing.
Evaluate forecasts with error metrics such as MAE or RMSE for accuracy.
Avoid using classification models or raw single points without context for time series forecasting.