Challenge - 5 Problems
Feature Engineering Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of creating a new feature by combining columns
What is the output DataFrame after running this code that creates a new feature by combining two columns?
Pandas
import pandas as pd df = pd.DataFrame({ 'height_cm': [170, 180, 160], 'weight_kg': [70, 80, 60] }) df['bmi'] = df['weight_kg'] / ((df['height_cm'] / 100) ** 2) print(df)
Attempts:
2 left
💡 Hint
Calculate BMI as weight divided by height in meters squared.
✗ Incorrect
The new column 'bmi' is calculated by dividing weight in kg by the square of height in meters. The result is a new column added to the DataFrame.
❓ data_output
intermediate1:30remaining
Number of unique categories after encoding
After applying one-hot encoding to the 'color' column, how many new columns are created?
Pandas
import pandas as pd df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']}) df_encoded = pd.get_dummies(df, columns=['color']) print(df_encoded.columns.tolist())
Attempts:
2 left
💡 Hint
Count unique values in the 'color' column.
✗ Incorrect
There are three unique colors: red, blue, and green. One-hot encoding creates one column per unique category.
🔧 Debug
advanced2:00remaining
Identify the error in feature scaling code
What error does this code raise when trying to scale a feature using Min-Max scaling?
Pandas
import pandas as pd df = pd.DataFrame({'score': [10, 20, 30, 40, 50]}) min_val = df['score'].min() max_val = df['score'].max() df['scaled'] = (df['score'] - min_val) / (max_val - min_val) print(df)
Attempts:
2 left
💡 Hint
Check the formula for Min-Max scaling carefully.
✗ Incorrect
The code divides by max_val instead of (max_val - min_val), so the scaling is incorrect but no error occurs.
❓ visualization
advanced2:30remaining
Interpret the histogram of a newly created feature
Given this code that creates a new feature 'age_group' and plots its histogram, what will the histogram show?
Pandas
import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({'age': [15, 22, 37, 45, 52, 67, 70]}) df['age_group'] = pd.cut(df['age'], bins=[0, 18, 35, 60, 100], labels=['Child', 'Young Adult', 'Adult', 'Senior']) df['age_group'].value_counts().plot(kind='bar') plt.show()
Attempts:
2 left
💡 Hint
Check which ages fall into each bin range.
✗ Incorrect
The bins split ages into four groups. Counting how many ages fall into each group gives the bar heights.
🚀 Application
expert3:00remaining
Choosing the best feature transformation for skewed data
You have a feature with a highly skewed distribution. Which transformation is most appropriate to reduce skewness before modeling?
Attempts:
2 left
💡 Hint
Think about transformations that reduce skewness, not just scale data.
✗ Incorrect
Log transformation reduces right skewness by compressing large values. Scaling methods do not reduce skewness.