Bird
Raised Fist0
ML Pythonml~5 mins

Polynomial regression pipeline in ML Python

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction

Polynomial regression helps us find curved lines that fit data better than straight lines. A pipeline makes it easy to do all steps together without mistakes.

When data points form a curve, not a straight line.
When you want to predict values that change in a non-linear way.
When you want to combine data changes and model training in one simple step.
When you want to avoid repeating data preparation steps manually.
When you want to test different curve degrees easily.
Syntax
ML Python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

pipeline = Pipeline([
    ('poly_features', PolynomialFeatures(degree=2)),
    ('linear_regression', LinearRegression())
])

pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

The pipeline runs steps in order: first it creates polynomial features, then fits a linear model.

Change degree to control curve complexity (2 means square terms).

Examples
This pipeline fits a cubic curve (degree 3) to the data.
ML Python
pipeline = Pipeline([
    ('poly_features', PolynomialFeatures(degree=3)),
    ('linear_regression', LinearRegression())
])
Degree 1 means no curve, just a straight line (simple linear regression).
ML Python
pipeline = Pipeline([
    ('poly_features', PolynomialFeatures(degree=1)),
    ('linear_regression', LinearRegression())
])
Sample Model

This program creates curved data, fits a polynomial regression model using a pipeline, and shows how well it predicts new points.

ML Python
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create sample data: y = 1 + 2x + 3x^2 + noise
np.random.seed(0)
X = np.linspace(-3, 3, 100).reshape(-1, 1)
y = 1 + 2 * X.flatten() + 3 * X.flatten()**2 + np.random.randn(100) * 3

# Split data into train and test
X_train, X_test = X[:80], X[80:]
y_train, y_test = y[:80], y[80:]

# Build polynomial regression pipeline with degree 2
pipeline = Pipeline([
    ('poly_features', PolynomialFeatures(degree=2)),
    ('linear_regression', LinearRegression())
])

# Train model
pipeline.fit(X_train, y_train)

# Predict on test data
predictions = pipeline.predict(X_test)

# Calculate mean squared error
mse = mean_squared_error(y_test, predictions)

# Print results
print(f"Mean Squared Error: {mse:.2f}")
print(f"Predictions: {predictions[:5]}")
OutputSuccess
Important Notes

PolynomialFeatures adds new columns like x², x³ to help model curves.

Using a pipeline avoids mistakes by running all steps together.

Higher degree means more complex curves but can cause overfitting.

Summary

Polynomial regression fits curved lines to data.

Pipelines combine data changes and model training in one step.

Adjust degree to control curve complexity and fit quality.

Practice

(1/5)
1.

What is the main purpose of using polynomial regression instead of simple linear regression?

easy
A. To fit curved relationships between variables
B. To reduce the number of features
C. To speed up training time
D. To handle missing data automatically

Solution

  1. Step 1: Understand linear regression limitation

    Linear regression fits straight lines, which cannot capture curves in data.
  2. Step 2: Role of polynomial regression

    Polynomial regression fits curved lines by adding powers of features, capturing non-linear patterns.
  3. Final Answer:

    To fit curved relationships between variables -> Option A
  4. Quick Check:

    Polynomial regression = curved fit [OK]
Hint: Polynomial regression fits curves, not just straight lines [OK]
Common Mistakes:
  • Thinking polynomial regression reduces features
  • Assuming it speeds up training
  • Believing it handles missing data automatically
2.

Which of the following is the correct way to create a polynomial regression pipeline in Python using sklearn?

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

pipeline = Pipeline([
    ('poly', PolynomialFeatures(degree=2)),
    ('linear', LinearRegression())
])
easy
A. pipeline = Pipeline([('poly', PolynomialFeatures(degree=2)), ('linear', LinearRegression())])
B. pipeline = Pipeline([('linear', LinearRegression()), ('poly', PolynomialFeatures(degree=2))])
C. pipeline = Pipeline([('poly', LinearRegression()), ('linear', PolynomialFeatures(degree=2))])
D. pipeline = Pipeline([('poly', PolynomialFeatures()), ('linear', LinearRegression(degree=2))])

Solution

  1. Step 1: Order of pipeline steps

    PolynomialFeatures must come before LinearRegression to transform data first.
  2. Step 2: Correct usage of classes and parameters

    PolynomialFeatures takes degree parameter; LinearRegression does not take degree.
  3. Final Answer:

    pipeline = Pipeline([('poly', PolynomialFeatures(degree=2)), ('linear', LinearRegression())]) -> Option A
  4. Quick Check:

    PolynomialFeatures before LinearRegression [OK]
Hint: Put PolynomialFeatures before LinearRegression in pipeline [OK]
Common Mistakes:
  • Swapping order of pipeline steps
  • Passing degree to LinearRegression
  • Omitting degree in PolynomialFeatures
3.

Given the following code, what will print(y_pred) output?

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

X = np.array([[1], [2], [3]])
y = np.array([1, 4, 9])

pipeline = Pipeline([
    ('poly', PolynomialFeatures(degree=2)),
    ('linear', LinearRegression())
])
pipeline.fit(X, y)
y_pred = pipeline.predict(np.array([[4]]))
print(np.round(y_pred, 2))
medium
A. [10.0]
B. [8.0]
C. [4.0]
D. [16.0]

Solution

  1. Step 1: Understand data and model

    X = [[1],[2],[3]] with y = [1,4,9] fits y = x^2 perfectly.
  2. Step 2: Predict for X=4 using polynomial degree 2

    Model learns y = x^2, so prediction at 4 is 4^2 = 16.
  3. Final Answer:

    [16.0] -> Option D
  4. Quick Check:

    4 squared = 16 [OK]
Hint: Polynomial degree 2 fits squares; predict 4^2 = 16 [OK]
Common Mistakes:
  • Ignoring polynomial transformation
  • Predicting linear value instead of squared
  • Rounding errors without np.round
4.

Identify the error in this polynomial regression pipeline code:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

pipeline = Pipeline([
    ('linear', LinearRegression()),
    ('poly', PolynomialFeatures(degree=3))
])

pipeline.fit(X_train, y_train)
medium
A. LinearRegression should not be used in pipeline
B. The order of pipeline steps is incorrect
C. PolynomialFeatures degree must be 2, not 3
D. Missing import for X_train and y_train

Solution

  1. Step 1: Check pipeline step order

    PolynomialFeatures must come before LinearRegression to transform data first.
  2. Step 2: Confirm degree and imports

    Degree 3 is valid; imports for data are assumed outside snippet.
  3. Final Answer:

    The order of pipeline steps is incorrect -> Option B
  4. Quick Check:

    PolynomialFeatures before LinearRegression [OK]
Hint: PolynomialFeatures must be first in pipeline [OK]
Common Mistakes:
  • Swapping order of steps
  • Thinking degree must be 2
  • Confusing missing data imports with pipeline error
5.

You want to model a dataset with a complex curve. You try polynomial regression with degree=2 but the fit is poor. What is the best next step?

hard
A. Remove polynomial features and use linear regression only
B. Decrease the polynomial degree to avoid overfitting
C. Increase the polynomial degree to capture more complexity
D. Use degree=2 but reduce training data size

Solution

  1. Step 1: Understand model complexity and fit

    Degree 2 polynomial may be too simple for complex curves, causing poor fit.
  2. Step 2: Adjust polynomial degree

    Increasing degree allows model to fit more complex patterns, improving fit quality.
  3. Final Answer:

    Increase the polynomial degree to capture more complexity -> Option C
  4. Quick Check:

    Higher degree = better complex fit [OK]
Hint: Raise degree to fit complex curves better [OK]
Common Mistakes:
  • Lowering degree when fit is poor
  • Removing polynomial features unnecessarily
  • Reducing data size instead of model complexity