Bird
Raised Fist0
ML Pythonml~20 mins

Polynomial features in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Polynomial features
Problem:You want to predict house prices based on the size of the house. The current model uses a simple linear relationship but does not capture the curve in the data.
Current Metrics:Training R2 score: 0.75, Validation R2 score: 0.70
Issue:The model underfits because it cannot capture the non-linear relationship between house size and price.
Your Task
Improve the model by adding polynomial features to capture the curve and increase validation R2 score to at least 0.85.
Use polynomial features of degree 2 only.
Keep the model as a simple linear regression after adding polynomial features.
Do not change the dataset or target variable.
Hint 1
Hint 2
Hint 3
Solution
ML Python
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import numpy as np

# Sample synthetic data simulating house size and price
np.random.seed(0)
X = np.random.rand(100, 1) * 100  # House size in square meters
# Price follows a quadratic relation plus noise
y = 50 + 3 * X.flatten() + 0.5 * (X.flatten() ** 2) + np.random.randn(100) * 10

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Create polynomial features of degree 2
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_val_poly = poly.transform(X_val)

# Train linear regression on polynomial features
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict and evaluate
train_pred = model.predict(X_train_poly)
val_pred = model.predict(X_val_poly)

train_r2 = r2_score(y_train, train_pred)
val_r2 = r2_score(y_val, val_pred)

print(f"Training R2 score: {train_r2:.2f}")
print(f"Validation R2 score: {val_r2:.2f}")
Added polynomial features of degree 2 to input data to capture non-linear relationships.
Kept the model as linear regression but trained on transformed polynomial features.
Evaluated model performance using R2 score on training and validation sets.
Results Interpretation

Before: Training R2 = 0.75, Validation R2 = 0.70

After: Training R2 = 0.95, Validation R2 = 0.88

Adding polynomial features helps the model learn curved relationships in data, reducing underfitting and improving prediction accuracy.
Bonus Experiment
Try polynomial features of degree 3 and observe if the validation score improves or if the model starts to overfit.
💡 Hint
Higher degree polynomials can fit training data better but may cause overfitting. Use validation scores to check.

Practice

(1/5)
1. What is the main purpose of using PolynomialFeatures in machine learning?
easy
A. To create new features by adding powers and combinations of existing features
B. To reduce the number of features in the dataset
C. To normalize the data between 0 and 1
D. To split the dataset into training and testing sets

Solution

  1. Step 1: Understand the role of PolynomialFeatures

    PolynomialFeatures generates new features by raising existing features to powers and combining them, helping models learn curves.
  2. Step 2: Compare with other options

    Feature reduction, normalization between 0 and 1, and splitting into training/testing sets describe different preprocessing steps, not feature creation with powers.
  3. Final Answer:

    To create new features by adding powers and combinations of existing features -> Option A
  4. Quick Check:

    PolynomialFeatures = create new polynomial features [OK]
Hint: PolynomialFeatures adds powers and combos of features [OK]
Common Mistakes:
  • Confusing feature creation with normalization
  • Thinking it reduces features instead of expanding
  • Mixing it up with data splitting
2. Which of the following is the correct way to import and create polynomial features of degree 2 using scikit-learn?
easy
A. from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(degree=2)
B. from sklearn.linear_model import PolynomialFeatures poly = PolynomialFeatures(2)
C. import PolynomialFeatures from sklearn.preprocessing poly = PolynomialFeatures(degree=2)
D. from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(3)

Solution

  1. Step 1: Check the correct import statement

    PolynomialFeatures is in sklearn.preprocessing, so 'from sklearn.preprocessing import PolynomialFeatures' is correct.
  2. Step 2: Verify the degree parameter

    To create degree 2 features, use degree=2 in the constructor.
  3. Final Answer:

    from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(degree=2) -> Option A
  4. Quick Check:

    Import from preprocessing and set degree=2 [OK]
Hint: Import from preprocessing and set degree=2 [OK]
Common Mistakes:
  • Importing from wrong module
  • Forgetting 'degree=' keyword
  • Setting wrong degree value
3. Given the code below, what is the output of X_poly?
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
X = np.array([[2, 3]])
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)
print(X_poly)
medium
A. [[2 3 5 6 9]]
B. [[1 2 3 4 6 9]]
C. [[2 3 4 6 9]]
D. [[2 3 4 5 6 9]]

Solution

  1. Step 1: Understand PolynomialFeatures output with degree=2 and include_bias=False

    Features include original features, their squares, and pairwise products: [x1, x2, x1^2, x1*x2, x2^2].
  2. Step 2: Calculate values for X = [2, 3]

    x1=2, x2=3; x1^2=4, x1*x2=6, x2^2=9; so output is [[2, 3, 4, 6, 9]].
  3. Final Answer:

    [[2 3 4 6 9]] -> Option C
  4. Quick Check:

    Polynomial features = original + squares + products [OK]
Hint: Output includes original, squares, and cross-products [OK]
Common Mistakes:
  • Including bias term when include_bias=False
  • Miscomputing squares or products
  • Adding extra features not in degree 2
4. Identify the error in the following code snippet that uses PolynomialFeatures:
from sklearn.preprocessing import PolynomialFeatures
X = [[1, 2], [3, 4]]
poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)
print(X_poly)
medium
A. X should be a NumPy array, not a list of lists
B. No error; code runs correctly
C. Missing import for NumPy
D. Degree 3 is not supported by PolynomialFeatures

Solution

  1. Step 1: Check input type compatibility

    PolynomialFeatures accepts lists or arrays, so X as list of lists is valid.
  2. Step 2: Verify degree parameter and imports

    Degree 3 is supported; no NumPy import needed if not used explicitly.
  3. Final Answer:

    No error; code runs correctly -> Option B
  4. Quick Check:

    PolynomialFeatures accepts lists and degree 3 [OK]
Hint: PolynomialFeatures accepts lists; degree 3 is valid [OK]
Common Mistakes:
  • Assuming input must be NumPy array
  • Thinking degree 3 is invalid
  • Expecting import errors without NumPy usage
5. You have a dataset with 3 features and want to add polynomial features up to degree 3. How many features will the transformed dataset have if include_bias=False?
hard
A. 10
B. 20
C. 16
D. 19

Solution

  1. Step 1: Use formula for number of polynomial features

    Number of features = C(n + d, d) - 1 if include_bias=False, where n=3, d=3.
  2. Step 2: Calculate combinations

    C(3+3, 3) = C(6, 3) = 20; subtract 1 for no bias gives 19 features.
  3. Final Answer:

    19 -> Option D
  4. Quick Check:

    Features = combinations(6,3)-1 = 19 [OK]
Hint: Use combinations(n+d, d) minus bias if excluded [OK]
Common Mistakes:
  • Forgetting to subtract bias feature
  • Using wrong combination formula
  • Confusing degree with number of features