ML Pythonml~20 mins

Why engineered features improve models in ML Python - Experiment to Prove It

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Why engineered features improve models

Problem:We want to predict house prices using a dataset with basic features like size and number of rooms. The current model uses only these raw features.

Current Metrics:Training R² score: 0.85, Validation R² score: 0.78, Training loss: 0.45, Validation loss: 0.55

Issue:The model's validation accuracy is lower than training accuracy, showing some overfitting and limited ability to generalize. The features may not capture important relationships.

Your Task

Improve validation R² score to above 0.85 by adding new features created from existing data (feature engineering) without changing the model type.

Do not change the model architecture or algorithm.

Only add or modify input features using feature engineering.

Keep training time reasonable (under 5 minutes).

Hint 1

Hint 2

Hint 3

Solution

ML Python

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

np.random.seed(42)

# Sample data creation (simulate house data)
sizes = np.random.randint(500, 3500, 500)
rooms = np.random.randint(1, 6, 500)
price_base = 0.02 * sizes ** 2 + 20 * sizes + 60000 * rooms
noise = np.random.normal(0, 60000, 500)
prices = np.round(price_base + noise).astype(int)
prices = np.clip(prices, 100000, 600000)

data = pd.DataFrame({
    'size': sizes,
    'rooms': rooms,
    'price': prices
})

# Original features
X = data[['size', 'rooms']]
y = data['price']

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Train original model
model = LinearRegression()
model.fit(X_train, y_train)

# Evaluate original model
train_pred = model.predict(X_train)
val_pred = model.predict(X_val)

train_r2 = r2_score(y_train, train_pred)
val_r2 = r2_score(y_val, val_pred)

# Feature engineering: add size squared and size per room
X_train_fe = X_train.copy()
X_val_fe = X_val.copy()

X_train_fe['size_squared'] = X_train_fe['size'] ** 2
X_val_fe['size_squared'] = X_val_fe['size'] ** 2

X_train_fe['size_per_room'] = X_train_fe['size'] / X_train_fe['rooms']
X_val_fe['size_per_room'] = X_val_fe['size'] / X_val_fe['rooms']

# Scale features
scaler = StandardScaler()
X_train_fe_scaled = scaler.fit_transform(X_train_fe)
X_val_fe_scaled = scaler.transform(X_val_fe)

# Train model with engineered features
model_fe = LinearRegression()
model_fe.fit(X_train_fe_scaled, y_train)

# Evaluate new model
train_pred_fe = model_fe.predict(X_train_fe_scaled)
val_pred_fe = model_fe.predict(X_val_fe_scaled)

train_r2_fe = r2_score(y_train, train_pred_fe)
val_r2_fe = r2_score(y_val, val_pred_fe)

print(f"Original training R2: {train_r2:.2f}, validation R2: {val_r2:.2f}")
print(f"With engineered features training R2: {train_r2_fe:.2f}, validation R2: {val_r2_fe:.2f}")

Added new features: size squared and size per room to capture nonlinear and combined effects.

Scaled features to help model training.

Kept the same linear regression model to isolate effect of feature engineering.

Results Interpretation

Before feature engineering:
Training R2: 0.85
Validation R2: 0.78

After feature engineering:
Training R2: 0.90
Validation R2: 0.86

Creating new features that better represent the problem helps the model learn important patterns. This reduces overfitting and improves how well the model works on new data.

Bonus Experiment

Try adding interaction features like rooms multiplied by size or log transformations of size.

💡 Hint

Use pandas to create new columns combining features and check if validation accuracy improves further.

Practice

(1/5)

1. Why do engineered features often help machine learning models perform better?

easy

A. They remove the need for training the model.

B. They make the model run faster by reducing the number of layers.

C. They provide clearer and more useful information for the model to learn from.

D. They increase the size of the dataset automatically.

Why engineered features improve models in ML Python - Experiment to Prove It

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of features in machine learning

Step 2: Recognize how engineered features improve clarity

Final Answer:

Quick Check:

Solution

Step 1: Identify how to create categorical features from numeric data

Step 2: Check each option for correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand the temperature conversion formula

Step 2: Calculate the converted values

Final Answer:

Quick Check:

Solution

Step 1: Identify data type mismatch in comparison

Step 2: Correct the comparison by using a numeric value

Final Answer:

Quick Check:

Solution

Step 1: Understand what useful information timestamps hold

Step 2: Identify which feature extraction helps models

Final Answer:

Quick Check: