0
0
Pandasdata~20 mins

Feature engineering basics in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Feature Engineering Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of creating a new feature by combining columns
What is the output DataFrame after running this code that creates a new feature by combining two columns?
Pandas
import pandas as pd

df = pd.DataFrame({
    'height_cm': [170, 180, 160],
    'weight_kg': [70, 80, 60]
})
df['bmi'] = df['weight_kg'] / ((df['height_cm'] / 100) ** 2)
print(df)
A{'height_cm': [170, 180, 160], 'weight_kg': [70, 80, 60], 'bmi': [24.22, 24.69, 23.44]}
B{'height_cm': [170, 180, 160], 'weight_kg': [70, 80, 60], 'bmi': [24.22, 24.69, 23.44, 25.0]}
C{'height_cm': [170, 180, 160], 'weight_kg': [70, 80, 60]}
DSyntaxError
Attempts:
2 left
💡 Hint
Calculate BMI as weight divided by height in meters squared.
data_output
intermediate
1:30remaining
Number of unique categories after encoding
After applying one-hot encoding to the 'color' column, how many new columns are created?
Pandas
import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']})
df_encoded = pd.get_dummies(df, columns=['color'])
print(df_encoded.columns.tolist())
A2
B4
C5
D3
Attempts:
2 left
💡 Hint
Count unique values in the 'color' column.
🔧 Debug
advanced
2:00remaining
Identify the error in feature scaling code
What error does this code raise when trying to scale a feature using Min-Max scaling?
Pandas
import pandas as pd

df = pd.DataFrame({'score': [10, 20, 30, 40, 50]})
min_val = df['score'].min()
max_val = df['score'].max()
df['scaled'] = (df['score'] - min_val) / (max_val - min_val)
print(df)
ATypeError
BNo error, but scaling is incorrect
CKeyError
DZeroDivisionError
Attempts:
2 left
💡 Hint
Check the formula for Min-Max scaling carefully.
visualization
advanced
2:30remaining
Interpret the histogram of a newly created feature
Given this code that creates a new feature 'age_group' and plots its histogram, what will the histogram show?
Pandas
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'age': [15, 22, 37, 45, 52, 67, 70]})
df['age_group'] = pd.cut(df['age'], bins=[0, 18, 35, 60, 100], labels=['Child', 'Young Adult', 'Adult', 'Senior'])
df['age_group'].value_counts().plot(kind='bar')
plt.show()
ABar chart with counts: Child=1, Young Adult=2, Adult=2, Senior=2
BBar chart with counts: Child=0, Young Adult=2, Adult=3, Senior=2
CBar chart with counts: Child=1, Young Adult=1, Adult=2, Senior=3
DBar chart with counts: Child=1, Young Adult=1, Adult=3, Senior=2
Attempts:
2 left
💡 Hint
Check which ages fall into each bin range.
🚀 Application
expert
3:00remaining
Choosing the best feature transformation for skewed data
You have a feature with a highly skewed distribution. Which transformation is most appropriate to reduce skewness before modeling?
AApply a logarithmic transformation (log(x + 1))
BApply a Min-Max scaling
CApply one-hot encoding
DApply standardization (z-score scaling)
Attempts:
2 left
💡 Hint
Think about transformations that reduce skewness, not just scale data.