Introduction
Engineered features help models learn better by giving them clearer and more useful information from the data.
Jump into concepts and practice - no test required
No fixed syntax because feature engineering is about creating new data columns or transforming existing ones using code or tools.
df['age_squared'] = df['age'] ** 2
df['income_per_person'] = df['income'] / df['family_size']
df['is_weekend'] = df['day_of_week'].apply(lambda x: 1 if x in ['Saturday', 'Sunday'] else 0)
import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # Sample data data = {'age': [25, 32, 47, 51, 62], 'income': [50000, 60000, 80000, 90000, 120000], 'family_size': [3, 4, 2, 5, 3], 'house_price': [200000, 250000, 320000, 360000, 400000]} # Create DataFrame df = pd.DataFrame(data) # Feature engineering: create income per person df['income_per_person'] = df['income'] / df['family_size'] # Prepare features and target X = df[['age', 'income', 'family_size']] X_eng = df[['age', 'income', 'family_size', 'income_per_person']] y = df['house_price'] # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) X_eng_train, X_eng_test, y_train_eng, y_test_eng = train_test_split(X_eng, y, random_state=42) # Train model without engineered feature model = LinearRegression() model.fit(X_train, y_train) y_pred = model.predict(X_test) mse_without = mean_squared_error(y_test, y_pred) # Train model with engineered feature model_eng = LinearRegression() model_eng.fit(X_eng_train, y_train_eng) y_pred_eng = model_eng.predict(X_eng_test) mse_with = mean_squared_error(y_test_eng, y_pred_eng) print(f"MSE without engineered feature: {mse_without:.2e}") print(f"MSE with engineered feature: {mse_with:.2e}")
age_group from an age column in Python using pandas?apply with a function lets us assign categories like 'young' or 'old' based on age.apply with a lambda function to create age_group correctly. df['age_group'] = df['age'] > 30 creates a boolean, not a group. The sum and mean options compute sums or means, not groups.print(df) after feature engineering?
import pandas as pd
df = pd.DataFrame({'temp_c': [0, 20, 30]})
df['temp_f'] = df['temp_c'] * 9/5 + 32
print(df)temp_c.is_adult but it gives wrong results. What is the bug?
df['is_adult'] = df['age'] > '18'
age values to a string '18', which leads to wrong boolean results.age to string '18' causes incorrect results. -> Option A