0
0
ML Pythonml~5 mins

Polynomial regression pipeline in ML Python

Choose your learning style9 modes available
Introduction

Polynomial regression helps us find curved lines that fit data better than straight lines. A pipeline makes it easy to do all steps together without mistakes.

When data points form a curve, not a straight line.
When you want to predict values that change in a non-linear way.
When you want to combine data changes and model training in one simple step.
When you want to avoid repeating data preparation steps manually.
When you want to test different curve degrees easily.
Syntax
ML Python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

pipeline = Pipeline([
    ('poly_features', PolynomialFeatures(degree=2)),
    ('linear_regression', LinearRegression())
])

pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

The pipeline runs steps in order: first it creates polynomial features, then fits a linear model.

Change degree to control curve complexity (2 means square terms).

Examples
This pipeline fits a cubic curve (degree 3) to the data.
ML Python
pipeline = Pipeline([
    ('poly_features', PolynomialFeatures(degree=3)),
    ('linear_regression', LinearRegression())
])
Degree 1 means no curve, just a straight line (simple linear regression).
ML Python
pipeline = Pipeline([
    ('poly_features', PolynomialFeatures(degree=1)),
    ('linear_regression', LinearRegression())
])
Sample Model

This program creates curved data, fits a polynomial regression model using a pipeline, and shows how well it predicts new points.

ML Python
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create sample data: y = 1 + 2x + 3x^2 + noise
np.random.seed(0)
X = np.linspace(-3, 3, 100).reshape(-1, 1)
y = 1 + 2 * X.flatten() + 3 * X.flatten()**2 + np.random.randn(100) * 3

# Split data into train and test
X_train, X_test = X[:80], X[80:]
y_train, y_test = y[:80], y[80:]

# Build polynomial regression pipeline with degree 2
pipeline = Pipeline([
    ('poly_features', PolynomialFeatures(degree=2)),
    ('linear_regression', LinearRegression())
])

# Train model
pipeline.fit(X_train, y_train)

# Predict on test data
predictions = pipeline.predict(X_test)

# Calculate mean squared error
mse = mean_squared_error(y_test, predictions)

# Print results
print(f"Mean Squared Error: {mse:.2f}")
print(f"Predictions: {predictions[:5]}")
OutputSuccess
Important Notes

PolynomialFeatures adds new columns like x², x³ to help model curves.

Using a pipeline avoids mistakes by running all steps together.

Higher degree means more complex curves but can cause overfitting.

Summary

Polynomial regression fits curved lines to data.

Pipelines combine data changes and model training in one step.

Adjust degree to control curve complexity and fit quality.