Bird
Raised Fist0
ML Pythonml~5 mins

Creating interaction features in ML Python

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction

Interaction features help models learn how two or more things together affect the result. They show combined effects that single features alone might miss.

When you think two features together change the outcome differently than alone, like age and exercise affecting health.
When simple features don't explain the data well, and you want to add more detail.
When you want to improve model accuracy by capturing relationships between features.
When working with linear models that don't automatically learn feature combinations.
When you want to explore how features work together before training a model.
Syntax
ML Python
interaction_feature = feature1 * feature2

You create interaction features by multiplying two or more features.

This works for numeric features; for categorical features, you may need encoding first.

Examples
This creates a new feature by multiplying age and exercise hours.
ML Python
df['age_exercise'] = df['age'] * df['exercise_hours']
Combines income and education years to capture their joint effect.
ML Python
df['income_education'] = df['income'] * df['education_years']
For categorical features encoded as numbers, multiply to create interaction.
ML Python
df['gender_smoking'] = df['gender_encoded'] * df['smoking_status_encoded']
Sample Model

This example creates an interaction feature by multiplying age and exercise hours. Then it trains a simple linear model to predict health score. The output shows the error and predictions.

ML Python
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Sample data
data = {
    'age': [25, 32, 47, 51, 62],
    'exercise_hours': [3, 0, 1, 4, 2],
    'health_score': [80, 60, 70, 85, 75]
}
df = pd.DataFrame(data)

# Create interaction feature
df['age_exercise'] = df['age'] * df['exercise_hours']

# Features and target
X = df[['age', 'exercise_hours', 'age_exercise']]
y = df['health_score']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Calculate error
mse = mean_squared_error(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"Predictions: {y_pred.round(2).tolist()}")
OutputSuccess
Important Notes

Interaction features can increase model complexity, so use them wisely.

Always check if interaction features improve your model by comparing metrics.

For many features, consider automated tools to create interactions to avoid too many combinations.

Summary

Interaction features combine two or more features to capture their joint effect.

They help models learn relationships that single features miss.

Created by multiplying numeric features or encoded categorical features.

Practice

(1/5)
1. What is the main purpose of creating interaction features in machine learning?
easy
A. To capture the combined effect of two or more features on the target
B. To reduce the number of features in the dataset
C. To normalize the features to a common scale
D. To remove irrelevant features automatically

Solution

  1. Step 1: Understand interaction features

    Interaction features combine two or more features to capture their joint effect on the target variable.
  2. Step 2: Compare options

    Only To capture the combined effect of two or more features on the target describes capturing combined effects, which is the purpose of interaction features.
  3. Final Answer:

    To capture the combined effect of two or more features on the target -> Option A
  4. Quick Check:

    Interaction features = combined effect [OK]
Hint: Interaction features capture combined effects of features [OK]
Common Mistakes:
  • Confusing interaction features with feature scaling
  • Thinking interaction features reduce feature count
  • Assuming interaction features remove irrelevant features
2. Which of the following is the correct way to create an interaction feature between two numeric features x1 and x2 in Python?
easy
A. interaction = x1 * x2
B. interaction = x1 - x2
C. interaction = x1 / x2
D. interaction = x1 + x2

Solution

  1. Step 1: Recall how interaction features are created

    Interaction features are typically created by multiplying numeric features to capture their joint effect.
  2. Step 2: Check each option

    Only multiplication (x1 * x2) correctly creates an interaction feature.
  3. Final Answer:

    interaction = x1 * x2 -> Option A
  4. Quick Check:

    Interaction = multiply features [OK]
Hint: Multiply numeric features to create interaction features [OK]
Common Mistakes:
  • Using addition instead of multiplication
  • Using division or subtraction which do not capture interaction
  • Confusing interaction with feature scaling
3. Given the code below, what will be the output of print(df['interaction'].tolist())?
import pandas as pd

df = pd.DataFrame({'x1': [1, 2, 3], 'x2': [4, 5, 6]})
df['interaction'] = df['x1'] * df['x2']
print(df['interaction'].tolist())
medium
A. [4, 5, 6]
B. [5, 7, 9]
C. [1, 2, 3]
D. [4, 10, 18]

Solution

  1. Step 1: Calculate interaction feature values

    Multiply each pair: 1*4=4, 2*5=10, 3*6=18.
  2. Step 2: Verify output list

    The list of interaction values is [4, 10, 18].
  3. Final Answer:

    [4, 10, 18] -> Option D
  4. Quick Check:

    Multiplying pairs = [4, 10, 18] [OK]
Hint: Multiply row-wise values for interaction feature list [OK]
Common Mistakes:
  • Adding instead of multiplying features
  • Confusing original features with interaction
  • Misreading the DataFrame values
4. The following code attempts to create an interaction feature between two categorical features color and shape. What is the error?
import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue'], 'shape': ['circle', 'square']})
df['interaction'] = df['color'] * df['shape']
print(df['interaction'])
medium
A. DataFrame columns must be numeric to create interaction
B. The DataFrame is missing a target column
C. You cannot multiply string columns directly; need encoding first
D. The print statement syntax is incorrect

Solution

  1. Step 1: Understand data types for interaction

    Multiplying string columns causes an error because strings cannot be multiplied directly.
  2. Step 2: Identify correct approach

    Categorical features must be encoded (e.g., one-hot or label encoding) before creating interaction features.
  3. Final Answer:

    You cannot multiply string columns directly; need encoding first -> Option C
  4. Quick Check:

    Multiply strings error = need encoding [OK]
Hint: Encode categorical features before multiplying [OK]
Common Mistakes:
  • Trying to multiply raw string columns
  • Ignoring data type requirements for interaction
  • Assuming print syntax is wrong
5. You have two categorical features: Gender with values ['Male', 'Female'] and Smoker with values ['Yes', 'No']. How would you create an interaction feature to help a model learn their combined effect?
hard
A. Multiply the raw string columns directly
B. One-hot encode both features, then multiply corresponding columns
C. Add the string values together as new strings
D. Ignore interaction features for categorical data

Solution

  1. Step 1: Encode categorical features

    Convert 'Gender' and 'Smoker' into one-hot encoded numeric columns.
  2. Step 2: Create interaction features

    Multiply corresponding one-hot columns (e.g., Male*Yes) to capture combined effect.
  3. Final Answer:

    One-hot encode both features, then multiply corresponding columns -> Option B
  4. Quick Check:

    Encode then multiply categorical features [OK]
Hint: One-hot encode then multiply for categorical interaction [OK]
Common Mistakes:
  • Trying to multiply raw strings
  • Concatenating strings instead of encoding
  • Skipping interaction features for categorical data