Challenge - 5 Problems

🎖️

Feature Engineering Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of creating a new feature by combining columns

What is the output DataFrame after running this code that creates a new feature by combining two columns?

Pandas

import pandas as pd

df = pd.DataFrame({
    'height_cm': [170, 180, 160],
    'weight_kg': [70, 80, 60]
})
df['bmi'] = df['weight_kg'] / ((df['height_cm'] / 100) ** 2)
print(df)

A{'height_cm': [170, 180, 160], 'weight_kg': [70, 80, 60], 'bmi': [24.22, 24.69, 23.44]}

B{'height_cm': [170, 180, 160], 'weight_kg': [70, 80, 60], 'bmi': [24.22, 24.69, 23.44, 25.0]}

C{'height_cm': [170, 180, 160], 'weight_kg': [70, 80, 60]}

DSyntaxError

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of unique categories after encoding

After applying one-hot encoding to the 'color' column, how many new columns are created?

Pandas

import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']})
df_encoded = pd.get_dummies(df, columns=['color'])
print(df_encoded.columns.tolist())

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in feature scaling code

What error does this code raise when trying to scale a feature using Min-Max scaling?

Pandas

import pandas as pd

df = pd.DataFrame({'score': [10, 20, 30, 40, 50]})
min_val = df['score'].min()
max_val = df['score'].max()
df['scaled'] = (df['score'] - min_val) / (max_val - min_val)
print(df)

ATypeError

BNo error, but scaling is incorrect

CKeyError

DZeroDivisionError

Attempts:

2 left

❓ visualization

advanced

2:30remaining

Interpret the histogram of a newly created feature

Given this code that creates a new feature 'age_group' and plots its histogram, what will the histogram show?

Pandas

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'age': [15, 22, 37, 45, 52, 67, 70]})
df['age_group'] = pd.cut(df['age'], bins=[0, 18, 35, 60, 100], labels=['Child', 'Young Adult', 'Adult', 'Senior'])
df['age_group'].value_counts().plot(kind='bar')
plt.show()

ABar chart with counts: Child=1, Young Adult=2, Adult=2, Senior=2

BBar chart with counts: Child=0, Young Adult=2, Adult=3, Senior=2

CBar chart with counts: Child=1, Young Adult=1, Adult=2, Senior=3

DBar chart with counts: Child=1, Young Adult=1, Adult=3, Senior=2

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing the best feature transformation for skewed data

You have a feature with a highly skewed distribution. Which transformation is most appropriate to reduce skewness before modeling?

AApply a logarithmic transformation (log(x + 1))

BApply a Min-Max scaling

CApply one-hot encoding

DApply standardization (z-score scaling)

Attempts:

2 left