Polynomial regression helps us find curved lines that fit data better than straight lines. A pipeline makes it easy to do all steps together without mistakes.
Polynomial regression pipeline in ML Python
Start learning this pattern below
Jump into concepts and practice - no test required
from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression pipeline = Pipeline([ ('poly_features', PolynomialFeatures(degree=2)), ('linear_regression', LinearRegression()) ]) pipeline.fit(X_train, y_train) predictions = pipeline.predict(X_test)
The pipeline runs steps in order: first it creates polynomial features, then fits a linear model.
Change degree to control curve complexity (2 means square terms).
pipeline = Pipeline([
('poly_features', PolynomialFeatures(degree=3)),
('linear_regression', LinearRegression())
])pipeline = Pipeline([
('poly_features', PolynomialFeatures(degree=1)),
('linear_regression', LinearRegression())
])This program creates curved data, fits a polynomial regression model using a pipeline, and shows how well it predicts new points.
import numpy as np from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error # Create sample data: y = 1 + 2x + 3x^2 + noise np.random.seed(0) X = np.linspace(-3, 3, 100).reshape(-1, 1) y = 1 + 2 * X.flatten() + 3 * X.flatten()**2 + np.random.randn(100) * 3 # Split data into train and test X_train, X_test = X[:80], X[80:] y_train, y_test = y[:80], y[80:] # Build polynomial regression pipeline with degree 2 pipeline = Pipeline([ ('poly_features', PolynomialFeatures(degree=2)), ('linear_regression', LinearRegression()) ]) # Train model pipeline.fit(X_train, y_train) # Predict on test data predictions = pipeline.predict(X_test) # Calculate mean squared error mse = mean_squared_error(y_test, predictions) # Print results print(f"Mean Squared Error: {mse:.2f}") print(f"Predictions: {predictions[:5]}")
PolynomialFeatures adds new columns like x², x³ to help model curves.
Using a pipeline avoids mistakes by running all steps together.
Higher degree means more complex curves but can cause overfitting.
Polynomial regression fits curved lines to data.
Pipelines combine data changes and model training in one step.
Adjust degree to control curve complexity and fit quality.
Practice
What is the main purpose of using polynomial regression instead of simple linear regression?
Solution
Step 1: Understand linear regression limitation
Linear regression fits straight lines, which cannot capture curves in data.Step 2: Role of polynomial regression
Polynomial regression fits curved lines by adding powers of features, capturing non-linear patterns.Final Answer:
To fit curved relationships between variables -> Option AQuick Check:
Polynomial regression = curved fit [OK]
- Thinking polynomial regression reduces features
- Assuming it speeds up training
- Believing it handles missing data automatically
Which of the following is the correct way to create a polynomial regression pipeline in Python using sklearn?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
pipeline = Pipeline([
('poly', PolynomialFeatures(degree=2)),
('linear', LinearRegression())
])Solution
Step 1: Order of pipeline steps
PolynomialFeatures must come before LinearRegression to transform data first.Step 2: Correct usage of classes and parameters
PolynomialFeatures takes degree parameter; LinearRegression does not take degree.Final Answer:
pipeline = Pipeline([('poly', PolynomialFeatures(degree=2)), ('linear', LinearRegression())]) -> Option AQuick Check:
PolynomialFeatures before LinearRegression [OK]
- Swapping order of pipeline steps
- Passing degree to LinearRegression
- Omitting degree in PolynomialFeatures
Given the following code, what will print(y_pred) output?
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
X = np.array([[1], [2], [3]])
y = np.array([1, 4, 9])
pipeline = Pipeline([
('poly', PolynomialFeatures(degree=2)),
('linear', LinearRegression())
])
pipeline.fit(X, y)
y_pred = pipeline.predict(np.array([[4]]))
print(np.round(y_pred, 2))Solution
Step 1: Understand data and model
X = [[1],[2],[3]] with y = [1,4,9] fits y = x^2 perfectly.Step 2: Predict for X=4 using polynomial degree 2
Model learns y = x^2, so prediction at 4 is 4^2 = 16.Final Answer:
[16.0] -> Option DQuick Check:
4 squared = 16 [OK]
- Ignoring polynomial transformation
- Predicting linear value instead of squared
- Rounding errors without np.round
Identify the error in this polynomial regression pipeline code:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
pipeline = Pipeline([
('linear', LinearRegression()),
('poly', PolynomialFeatures(degree=3))
])
pipeline.fit(X_train, y_train)Solution
Step 1: Check pipeline step order
PolynomialFeatures must come before LinearRegression to transform data first.Step 2: Confirm degree and imports
Degree 3 is valid; imports for data are assumed outside snippet.Final Answer:
The order of pipeline steps is incorrect -> Option BQuick Check:
PolynomialFeatures before LinearRegression [OK]
- Swapping order of steps
- Thinking degree must be 2
- Confusing missing data imports with pipeline error
You want to model a dataset with a complex curve. You try polynomial regression with degree=2 but the fit is poor. What is the best next step?
Solution
Step 1: Understand model complexity and fit
Degree 2 polynomial may be too simple for complex curves, causing poor fit.Step 2: Adjust polynomial degree
Increasing degree allows model to fit more complex patterns, improving fit quality.Final Answer:
Increase the polynomial degree to capture more complexity -> Option CQuick Check:
Higher degree = better complex fit [OK]
- Lowering degree when fit is poor
- Removing polynomial features unnecessarily
- Reducing data size instead of model complexity
