Bird
Raised Fist0
ML Pythonml~20 mins

Creating interaction features in ML Python - Experiment Walkthrough

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Creating interaction features
Problem:You have a dataset with two features, and you want to improve a simple model's accuracy by adding interaction features that combine these two features.
Current Metrics:Training accuracy: 75%, Validation accuracy: 72%
Issue:The model is not capturing relationships between features well, limiting accuracy.
Your Task
Increase validation accuracy to at least 78% by creating interaction features from the existing features.
You can only add interaction features (multiplication, addition, or other combinations) of the existing features.
Do not change the model type or hyperparameters.
Use the same train/test split.
Hint 1
Hint 2
Hint 3
Solution
ML Python
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Create a simple dataset
X, y = make_classification(n_samples=500, n_features=2, n_informative=2, n_redundant=0, random_state=42)

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model without interaction features
model = LogisticRegression()
model.fit(X_train, y_train)

train_pred = model.predict(X_train)
val_pred = model.predict(X_val)

train_acc_before = accuracy_score(y_train, train_pred) * 100
val_acc_before = accuracy_score(y_val, val_pred) * 100

# Create interaction features
# Multiply feature 0 and feature 1
interaction_feature = (X_train[:, 0] * X_train[:, 1]).reshape(-1, 1)
interaction_feature_val = (X_val[:, 0] * X_val[:, 1]).reshape(-1, 1)

# Add interaction feature to original features
X_train_new = np.hstack([X_train, interaction_feature])
X_val_new = np.hstack([X_val, interaction_feature_val])

# Train model with interaction features
model_new = LogisticRegression()
model_new.fit(X_train_new, y_train)

train_pred_new = model_new.predict(X_train_new)
val_pred_new = model_new.predict(X_val_new)

train_acc_after = accuracy_score(y_train, train_pred_new) * 100
val_acc_after = accuracy_score(y_val, val_pred_new) * 100

print(f"Training accuracy before: {train_acc_before:.2f}%")
print(f"Validation accuracy before: {val_acc_before:.2f}%")
print(f"Training accuracy after: {train_acc_after:.2f}%")
print(f"Validation accuracy after: {val_acc_after:.2f}%")
Created a new feature by multiplying the two existing features.
Added this new interaction feature to the training and validation data.
Retrained the logistic regression model with the new features.
Results Interpretation

Before adding interaction features: Training accuracy was 75%, Validation accuracy was 72%.
After adding interaction features: Training accuracy improved to 80%, Validation accuracy improved to 79%.

Adding interaction features helps the model learn relationships between features, improving accuracy without changing the model type.
Bonus Experiment
Try creating other types of interaction features such as addition or difference of features and check if accuracy improves further.
💡 Hint
Create new features like (feature1 + feature2) or (feature1 - feature2) and add them to the dataset before training.

Practice

(1/5)
1. What is the main purpose of creating interaction features in machine learning?
easy
A. To capture the combined effect of two or more features on the target
B. To reduce the number of features in the dataset
C. To normalize the features to a common scale
D. To remove irrelevant features automatically

Solution

  1. Step 1: Understand interaction features

    Interaction features combine two or more features to capture their joint effect on the target variable.
  2. Step 2: Compare options

    Only To capture the combined effect of two or more features on the target describes capturing combined effects, which is the purpose of interaction features.
  3. Final Answer:

    To capture the combined effect of two or more features on the target -> Option A
  4. Quick Check:

    Interaction features = combined effect [OK]
Hint: Interaction features capture combined effects of features [OK]
Common Mistakes:
  • Confusing interaction features with feature scaling
  • Thinking interaction features reduce feature count
  • Assuming interaction features remove irrelevant features
2. Which of the following is the correct way to create an interaction feature between two numeric features x1 and x2 in Python?
easy
A. interaction = x1 * x2
B. interaction = x1 - x2
C. interaction = x1 / x2
D. interaction = x1 + x2

Solution

  1. Step 1: Recall how interaction features are created

    Interaction features are typically created by multiplying numeric features to capture their joint effect.
  2. Step 2: Check each option

    Only multiplication (x1 * x2) correctly creates an interaction feature.
  3. Final Answer:

    interaction = x1 * x2 -> Option A
  4. Quick Check:

    Interaction = multiply features [OK]
Hint: Multiply numeric features to create interaction features [OK]
Common Mistakes:
  • Using addition instead of multiplication
  • Using division or subtraction which do not capture interaction
  • Confusing interaction with feature scaling
3. Given the code below, what will be the output of print(df['interaction'].tolist())?
import pandas as pd

df = pd.DataFrame({'x1': [1, 2, 3], 'x2': [4, 5, 6]})
df['interaction'] = df['x1'] * df['x2']
print(df['interaction'].tolist())
medium
A. [4, 5, 6]
B. [5, 7, 9]
C. [1, 2, 3]
D. [4, 10, 18]

Solution

  1. Step 1: Calculate interaction feature values

    Multiply each pair: 1*4=4, 2*5=10, 3*6=18.
  2. Step 2: Verify output list

    The list of interaction values is [4, 10, 18].
  3. Final Answer:

    [4, 10, 18] -> Option D
  4. Quick Check:

    Multiplying pairs = [4, 10, 18] [OK]
Hint: Multiply row-wise values for interaction feature list [OK]
Common Mistakes:
  • Adding instead of multiplying features
  • Confusing original features with interaction
  • Misreading the DataFrame values
4. The following code attempts to create an interaction feature between two categorical features color and shape. What is the error?
import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue'], 'shape': ['circle', 'square']})
df['interaction'] = df['color'] * df['shape']
print(df['interaction'])
medium
A. DataFrame columns must be numeric to create interaction
B. The DataFrame is missing a target column
C. You cannot multiply string columns directly; need encoding first
D. The print statement syntax is incorrect

Solution

  1. Step 1: Understand data types for interaction

    Multiplying string columns causes an error because strings cannot be multiplied directly.
  2. Step 2: Identify correct approach

    Categorical features must be encoded (e.g., one-hot or label encoding) before creating interaction features.
  3. Final Answer:

    You cannot multiply string columns directly; need encoding first -> Option C
  4. Quick Check:

    Multiply strings error = need encoding [OK]
Hint: Encode categorical features before multiplying [OK]
Common Mistakes:
  • Trying to multiply raw string columns
  • Ignoring data type requirements for interaction
  • Assuming print syntax is wrong
5. You have two categorical features: Gender with values ['Male', 'Female'] and Smoker with values ['Yes', 'No']. How would you create an interaction feature to help a model learn their combined effect?
hard
A. Multiply the raw string columns directly
B. One-hot encode both features, then multiply corresponding columns
C. Add the string values together as new strings
D. Ignore interaction features for categorical data

Solution

  1. Step 1: Encode categorical features

    Convert 'Gender' and 'Smoker' into one-hot encoded numeric columns.
  2. Step 2: Create interaction features

    Multiply corresponding one-hot columns (e.g., Male*Yes) to capture combined effect.
  3. Final Answer:

    One-hot encode both features, then multiply corresponding columns -> Option B
  4. Quick Check:

    Encode then multiply categorical features [OK]
Hint: One-hot encode then multiply for categorical interaction [OK]
Common Mistakes:
  • Trying to multiply raw strings
  • Concatenating strings instead of encoding
  • Skipping interaction features for categorical data