Imagine you have two features: hours studied and hours slept. Why might creating an interaction feature (like multiplying these two) help your model?
Think about how two things working together might have a different effect than each separately.
Interaction features capture how two variables together affect the target differently than individually. For example, studying a lot but sleeping very little might have a different impact than studying and sleeping both moderately.
What is the output of this Python code that creates an interaction feature?
import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) df['A_B'] = df['A'] * df['B'] print(df['A_B'].tolist())
Multiply each element of column A by the corresponding element in column B.
The new column 'A_B' is created by multiplying values in 'A' and 'B' row-wise: 1*4=4, 2*5=10, 3*6=18.
You have created many interaction features from your dataset. Which model type is best suited to automatically capture complex interactions without explicitly creating interaction features?
Think about models that split data based on feature values and can capture combinations naturally.
Decision Trees and Random Forests can capture complex feature interactions by splitting data on multiple features in sequence, without needing explicit interaction features.
When using polynomial features to create interaction terms, which hyperparameter controls the maximum degree of interactions included?
It controls how many features are multiplied together.
The degree parameter in polynomial feature generation controls the highest power of features and thus the maximum interaction order included.
You add interaction features to a regression model. After training, the training error decreases but the validation error increases. What does this indicate?
Think about what it means when training error goes down but validation error goes up.
When training error decreases but validation error increases, the model fits training data too closely and fails to generalize, indicating overfitting often caused by too many or complex features like interaction terms.