0
0
ML Pythonml~20 mins

Polynomial features in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Polynomial features
Problem:You want to predict house prices based on the size of the house. The current model uses a simple linear relationship but does not capture the curve in the data.
Current Metrics:Training R2 score: 0.75, Validation R2 score: 0.70
Issue:The model underfits because it cannot capture the non-linear relationship between house size and price.
Your Task
Improve the model by adding polynomial features to capture the curve and increase validation R2 score to at least 0.85.
Use polynomial features of degree 2 only.
Keep the model as a simple linear regression after adding polynomial features.
Do not change the dataset or target variable.
Hint 1
Hint 2
Hint 3
Solution
ML Python
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import numpy as np

# Sample synthetic data simulating house size and price
np.random.seed(0)
X = np.random.rand(100, 1) * 100  # House size in square meters
# Price follows a quadratic relation plus noise
y = 50 + 3 * X.flatten() + 0.5 * (X.flatten() ** 2) + np.random.randn(100) * 10

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Create polynomial features of degree 2
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_val_poly = poly.transform(X_val)

# Train linear regression on polynomial features
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict and evaluate
train_pred = model.predict(X_train_poly)
val_pred = model.predict(X_val_poly)

train_r2 = r2_score(y_train, train_pred)
val_r2 = r2_score(y_val, val_pred)

print(f"Training R2 score: {train_r2:.2f}")
print(f"Validation R2 score: {val_r2:.2f}")
Added polynomial features of degree 2 to input data to capture non-linear relationships.
Kept the model as linear regression but trained on transformed polynomial features.
Evaluated model performance using R2 score on training and validation sets.
Results Interpretation

Before: Training R2 = 0.75, Validation R2 = 0.70

After: Training R2 = 0.95, Validation R2 = 0.88

Adding polynomial features helps the model learn curved relationships in data, reducing underfitting and improving prediction accuracy.
Bonus Experiment
Try polynomial features of degree 3 and observe if the validation score improves or if the model starts to overfit.
💡 Hint
Higher degree polynomials can fit training data better but may cause overfitting. Use validation scores to check.