0
0
ML Pythonml~5 mins

Train-test split for time series in ML Python

Choose your learning style9 modes available
Introduction
We split time series data into training and testing parts to check if our model can predict future values well, without cheating by looking ahead.
When you want to predict stock prices using past data.
When forecasting weather based on historical temperature records.
When analyzing sales trends to plan future inventory.
When monitoring sensor data to detect equipment failures.
When building models that learn from sequences over time.
Syntax
ML Python
train_size = int(len(data) * 0.8)
train = data[:train_size]
test = data[train_size:]
We split data by slicing, keeping the order intact because time matters.
Avoid random shuffling since it breaks the time order and can cause wrong results.
Examples
Splits 10 data points into 7 for training and 3 for testing, keeping order.
ML Python
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
train_size = int(len(data) * 0.7)
train = data[:train_size]
test = data[train_size:]
Using pandas Series, first 80 points for training, last 20 for testing.
ML Python
import pandas as pd
series = pd.Series(range(100))
train = series[:80]
test = series[80:]
Sample Model
This code creates a simple linear time series, splits it by time, trains a model on the past, and tests on future points.
ML Python
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create a simple time series: y = 2*x + noise
np.random.seed(0)
x = np.arange(50).reshape(-1, 1)
y = 2 * x.flatten() + np.random.normal(0, 5, 50)

# Split data: first 40 for training, last 10 for testing
train_size = 40
x_train, y_train = x[:train_size], y[:train_size]
x_test, y_test = x[train_size:], y[train_size:]

# Train linear regression model
model = LinearRegression()
model.fit(x_train, y_train)

# Predict on test data
y_pred = model.predict(x_test)

# Calculate mean squared error
mse = mean_squared_error(y_test, y_pred)

print(f"Mean Squared Error on test data: {mse:.2f}")
print(f"Predictions: {y_pred.round(2)}")
OutputSuccess
Important Notes
Always keep the time order when splitting time series data.
Test data should come after training data in time to simulate real forecasting.
Random shuffling is okay for regular data but not for time series.
Summary
Train-test split for time series keeps data order to respect time flow.
Use slicing to separate past (train) and future (test) data.
This helps check if the model can predict future values well.