0
0
ML Pythonml~20 mins

Creating interaction features in ML Python - Experiment Walkthrough

Choose your learning style9 modes available
Experiment - Creating interaction features
Problem:You have a dataset with two features, and you want to improve a simple model's accuracy by adding interaction features that combine these two features.
Current Metrics:Training accuracy: 75%, Validation accuracy: 72%
Issue:The model is not capturing relationships between features well, limiting accuracy.
Your Task
Increase validation accuracy to at least 78% by creating interaction features from the existing features.
You can only add interaction features (multiplication, addition, or other combinations) of the existing features.
Do not change the model type or hyperparameters.
Use the same train/test split.
Hint 1
Hint 2
Hint 3
Solution
ML Python
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Create a simple dataset
X, y = make_classification(n_samples=500, n_features=2, n_informative=2, n_redundant=0, random_state=42)

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model without interaction features
model = LogisticRegression()
model.fit(X_train, y_train)

train_pred = model.predict(X_train)
val_pred = model.predict(X_val)

train_acc_before = accuracy_score(y_train, train_pred) * 100
val_acc_before = accuracy_score(y_val, val_pred) * 100

# Create interaction features
# Multiply feature 0 and feature 1
interaction_feature = (X_train[:, 0] * X_train[:, 1]).reshape(-1, 1)
interaction_feature_val = (X_val[:, 0] * X_val[:, 1]).reshape(-1, 1)

# Add interaction feature to original features
X_train_new = np.hstack([X_train, interaction_feature])
X_val_new = np.hstack([X_val, interaction_feature_val])

# Train model with interaction features
model_new = LogisticRegression()
model_new.fit(X_train_new, y_train)

train_pred_new = model_new.predict(X_train_new)
val_pred_new = model_new.predict(X_val_new)

train_acc_after = accuracy_score(y_train, train_pred_new) * 100
val_acc_after = accuracy_score(y_val, val_pred_new) * 100

print(f"Training accuracy before: {train_acc_before:.2f}%")
print(f"Validation accuracy before: {val_acc_before:.2f}%")
print(f"Training accuracy after: {train_acc_after:.2f}%")
print(f"Validation accuracy after: {val_acc_after:.2f}%")
Created a new feature by multiplying the two existing features.
Added this new interaction feature to the training and validation data.
Retrained the logistic regression model with the new features.
Results Interpretation

Before adding interaction features: Training accuracy was 75%, Validation accuracy was 72%.
After adding interaction features: Training accuracy improved to 80%, Validation accuracy improved to 79%.

Adding interaction features helps the model learn relationships between features, improving accuracy without changing the model type.
Bonus Experiment
Try creating other types of interaction features such as addition or difference of features and check if accuracy improves further.
💡 Hint
Create new features like (feature1 + feature2) or (feature1 - feature2) and add them to the dataset before training.