Challenge - 5 Problems

🎖️

Time Series Split Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

Why is random train-test splitting not suitable for time series data?

In time series data, why should we avoid randomly splitting data into training and testing sets?

ABecause random splitting breaks the time order, causing data leakage from future to past.

BBecause random splitting reduces the size of the training set too much.

CBecause random splitting always causes the model to overfit.

DBecause random splitting makes the test set too small to evaluate.

Attempts:

2 left

❓ Predict Output

intermediate

1:30remaining

Output of a time series train-test split code

What is the length of the training and testing sets after this split?

ML Python

import numpy as np
from sklearn.model_selection import train_test_split

data = np.arange(10)
train, test = train_test_split(data, test_size=0.3, random_state=42)
print(len(train), len(test))

A3 7

B7 3

C10 0

D6 4

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Best method to split time series data for forecasting

Which method correctly splits time series data to avoid data leakage and respect temporal order?

ARandomly shuffle data then split into train and test sets.

BSplit data by selecting every other data point for testing.

CUse k-fold cross-validation with random folds.

DSplit data by taking the first 80% as training and last 20% as testing.

Attempts:

2 left

❓ Hyperparameter

advanced

1:30remaining

Choosing test size for time series split

What is a key consideration when choosing the test size for a time series train-test split?

ATest size should be as small as possible to maximize training data.

BTest size should be random to avoid bias.

CTest size should cover a full seasonal cycle if seasonality exists.

DTest size should always be 50% for balance.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identify the error in this time series split code

What error will this code raise when splitting time series data?

ML Python

import numpy as np
from sklearn.model_selection import train_test_split

data = np.arange(10)
train, test = train_test_split(data, test_size=0.3, shuffle=False)
print(train)
print(test)

ANo error; train contains first 7 elements, test last 3 elements.

BTypeError because test_size must be an integer when shuffle=False.

CValueError because shuffle=False is not allowed with train_test_split.

DIndexError because data is too small for test_size=0.3.

Attempts:

2 left

Practice

(1/5)

1. Why is it important to keep the order of data when doing a train-test split for time series?

easy

A. Because time series data depends on the order of events and future data should not be used to predict past data.

B. Because random shuffling improves model accuracy in time series.

C. Because train and test sets must have the same number of samples.

D. Because test data should always come before train data.

Train-test split for time series in ML Python - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand time series data nature

Step 2: Importance of order in train-test split

Final Answer:

Quick Check:

Solution

Step 1: Understand slicing for time series split

Step 2: Check each code snippet

Final Answer:

Quick Check:

Solution

Step 1: Calculate split index

Step 2: Calculate test length

Final Answer:

Quick Check:

Solution

Step 1: Understand train_test_split default behavior

Step 2: Why shuffling is a problem for time series

Final Answer:

Quick Check:

Solution

Step 1: Calculate split fraction for 2.5 years out of 3 years

Step 2: Use slicing to split data preserving order

Final Answer:

Quick Check: