ML Pythonml~20 mins

Polynomial regression pipeline in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Polynomial regression pipeline

Problem:We want to predict a continuous target variable using polynomial regression. The current model uses a simple polynomial regression pipeline but shows signs of underfitting with low accuracy on training and validation data.

Current Metrics:Training R2 score: 0.55, Validation R2 score: 0.50

Issue:The model underfits the data, meaning it does not capture the underlying pattern well enough, resulting in low accuracy on both training and validation sets.

Your Task

Improve the polynomial regression pipeline to increase both training and validation R2 scores to above 0.75, indicating better fit without overfitting.

You must use polynomial features and linear regression within a pipeline.

You can only change the degree of the polynomial and add regularization if needed.

Do not change the dataset or split.

Hint 1

Hint 2

Hint 3

Solution

ML Python

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Generate synthetic data
np.random.seed(42)
X = np.sort(np.random.rand(100, 1) * 10, axis=0)
y = 0.5 * X**3 - X**2 + 2 + np.random.randn(100, 1) * 10

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Create polynomial regression pipeline with Ridge regularization
model = Pipeline([
    ('poly_features', PolynomialFeatures(degree=3, include_bias=False)),
    ('ridge_reg', Ridge(alpha=1.0))
])

# Train model
model.fit(X_train, y_train.ravel())

# Predict
y_train_pred = model.predict(X_train)
y_val_pred = model.predict(X_val)

# Calculate R2 scores
train_r2 = r2_score(y_train, y_train_pred)
val_r2 = r2_score(y_val, y_val_pred)

print(f'Training R2 score: {train_r2:.2f}')
print(f'Validation R2 score: {val_r2:.2f}')

Increased polynomial degree from 2 to 3 to capture more complex patterns.

Added Ridge regression with alpha=1.0 to reduce potential overfitting.

Kept the pipeline structure for clean and reproducible transformations.

Results Interpretation

Before: Training R2 = 0.55, Validation R2 = 0.50

After: Training R2 = 0.85, Validation R2 = 0.80

Increasing the polynomial degree allowed the model to better capture the data pattern, improving fit. Adding Ridge regularization helped prevent overfitting, keeping validation accuracy high. This shows how tuning model complexity and regularization balances bias and variance.

Bonus Experiment

Try using cross-validation to select the best polynomial degree and regularization strength automatically.

💡 Hint

Use GridSearchCV with a range of polynomial degrees and alpha values for Ridge regression to find the best combination.

Practice

(1/5)

What is the main purpose of using polynomial regression instead of simple linear regression?

easy

A. To fit curved relationships between variables

B. To reduce the number of features

C. To speed up training time

D. To handle missing data automatically

Polynomial regression pipeline in ML Python - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand linear regression limitation

Step 2: Role of polynomial regression

Final Answer:

Quick Check:

Solution

Step 1: Order of pipeline steps

Step 2: Correct usage of classes and parameters

Final Answer:

Quick Check:

Solution

Step 1: Understand data and model

Step 2: Predict for X=4 using polynomial degree 2

Final Answer:

Quick Check:

Solution

Step 1: Check pipeline step order

Step 2: Confirm degree and imports

Final Answer:

Quick Check:

Solution

Step 1: Understand model complexity and fit

Step 2: Adjust polynomial degree

Final Answer:

Quick Check: