0
0
MlopsHow-ToBeginner · 4 min read

Time Series Forecasting in Python with sklearn: Simple Guide

To do time series forecasting in Python with sklearn, you first prepare your data with past values as features and future values as targets, then train a regression model like RandomForestRegressor. Finally, use the model to predict future points based on recent data.
📐

Syntax

Time series forecasting with sklearn involves these steps:

  • Prepare features: Use past time points as input features.
  • Prepare target: Use the next time point as the target to predict.
  • Train model: Fit a regression model like RandomForestRegressor on the features and target.
  • Predict: Use the trained model to forecast future values.
python
from sklearn.ensemble import RandomForestRegressor
import numpy as np

# Example data: time series values
series = np.array([10, 12, 13, 15, 16, 18, 20])

# Prepare features and target
X = []  # past values
y = []  # next value
window_size = 3
for i in range(len(series) - window_size):
    X.append(series[i:i+window_size])
    y.append(series[i+window_size])
X = np.array(X)
y = np.array(y)

# Train model
model = RandomForestRegressor()
model.fit(X, y)

# Predict next value
last_window = series[-window_size:].reshape(1, -1)
prediction = model.predict(last_window)
print(prediction)
Output
[21.4]
💻

Example

This example shows how to forecast the next value in a simple time series using RandomForestRegressor from sklearn. It uses a sliding window of past 3 values to predict the next one.

python
from sklearn.ensemble import RandomForestRegressor
import numpy as np

# Sample time series data
series = np.array([100, 102, 101, 105, 110, 108, 115, 120])

# Create features and target using window size 3
window_size = 3
X, y = [], []
for i in range(len(series) - window_size):
    X.append(series[i:i+window_size])
    y.append(series[i+window_size])
X = np.array(X)
y = np.array(y)

# Initialize and train the model
model = RandomForestRegressor(random_state=42)
model.fit(X, y)

# Predict the next value after the last window
last_window = series[-window_size:].reshape(1, -1)
predicted_value = model.predict(last_window)
print(f"Predicted next value: {predicted_value[0]:.2f}")
Output
Predicted next value: 121.25
⚠️

Common Pitfalls

Common mistakes in time series forecasting with sklearn include:

  • Not using a sliding window to create features, which means the model has no context of past values.
  • Mixing training and test data without respecting time order, causing data leakage.
  • Ignoring the need to scale or transform data if required by the model.
  • Using classification models instead of regression for forecasting numeric values.

Always split data by time (train on earlier points, test on later points) and create features that represent past values.

python
import numpy as np
from sklearn.ensemble import RandomForestRegressor

# Wrong: Using raw series as features without windowing
series = np.array([1, 2, 3, 4, 5, 6, 7])
X_wrong = series[:-1].reshape(-1, 1)  # single values, no past context
y_wrong = series[1:]
model_wrong = RandomForestRegressor()
model_wrong.fit(X_wrong, y_wrong)

# Right: Using sliding window
window_size = 3
X_right, y_right = [], []
for i in range(len(series) - window_size):
    X_right.append(series[i:i+window_size])
    y_right.append(series[i+window_size])
X_right = np.array(X_right)
y_right = np.array(y_right)
model_right = RandomForestRegressor()
model_right.fit(X_right, y_right)
📊

Quick Reference

Tips for time series forecasting with sklearn:

  • Use a sliding window to create features from past time points.
  • Choose regression models like RandomForestRegressor, LinearRegression, or GradientBoostingRegressor.
  • Split data by time to avoid data leakage.
  • Evaluate with metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE).
  • Scale or transform data if model requires it.

Key Takeaways

Create features using a sliding window of past values to predict the next time point.
Use regression models from sklearn like RandomForestRegressor for forecasting numeric series.
Always split data by time order to prevent data leakage between training and testing.
Evaluate forecasts with error metrics such as MAE or RMSE for accuracy.
Avoid using classification models or raw single points without context for time series forecasting.