0
0
ML Pythonml~5 mins

Why engineered features improve models in ML Python

Choose your learning style9 modes available
Introduction
Engineered features help models learn better by giving them clearer and more useful information from the data.
When raw data is too complex or noisy for the model to understand well.
When you want to highlight important patterns or relationships in the data.
When the model's accuracy is low and you want to improve it by adding meaningful inputs.
When you have domain knowledge that can create new helpful data points.
When you want to reduce the amount of data the model needs to learn from.
Syntax
ML Python
No fixed syntax because feature engineering is about creating new data columns or transforming existing ones using code or tools.
Feature engineering often uses simple operations like math, grouping, or combining columns.
It can be done using programming languages like Python with libraries such as pandas.
Examples
Create a new feature by squaring the 'age' column to capture non-linear effects.
ML Python
df['age_squared'] = df['age'] ** 2
Create a new feature that divides income by family size to get a per-person value.
ML Python
df['income_per_person'] = df['income'] / df['family_size']
Create a binary feature indicating if a day is on the weekend.
ML Python
df['is_weekend'] = df['day_of_week'].apply(lambda x: 1 if x in ['Saturday', 'Sunday'] else 0)
Sample Model
This example shows how adding a new feature 'income_per_person' can help the model predict house prices better by lowering the error.
ML Python
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Sample data
data = {'age': [25, 32, 47, 51, 62],
         'income': [50000, 60000, 80000, 90000, 120000],
         'family_size': [3, 4, 2, 5, 3],
         'house_price': [200000, 250000, 320000, 360000, 400000]}

# Create DataFrame
df = pd.DataFrame(data)

# Feature engineering: create income per person
df['income_per_person'] = df['income'] / df['family_size']

# Prepare features and target
X = df[['age', 'income', 'family_size']]
X_eng = df[['age', 'income', 'family_size', 'income_per_person']]
y = df['house_price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
X_eng_train, X_eng_test, y_train_eng, y_test_eng = train_test_split(X_eng, y, random_state=42)

# Train model without engineered feature
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse_without = mean_squared_error(y_test, y_pred)

# Train model with engineered feature
model_eng = LinearRegression()
model_eng.fit(X_eng_train, y_train_eng)
y_pred_eng = model_eng.predict(X_eng_test)
mse_with = mean_squared_error(y_test_eng, y_pred_eng)

print(f"MSE without engineered feature: {mse_without:.2e}")
print(f"MSE with engineered feature: {mse_with:.2e}")
OutputSuccess
Important Notes
Feature engineering can greatly improve model results but requires understanding the data well.
Sometimes adding too many features can confuse the model, so choose features carefully.
Try simple transformations first before complex ones.
Summary
Engineered features give models clearer, more useful information.
They help models find patterns that raw data might hide.
Good feature engineering can improve accuracy and reduce errors.