Practice

(1/5)

1. Why do engineered features often help machine learning models perform better?

easy

A. They remove the need for training the model.

B. They make the model run faster by reducing the number of layers.

C. They provide clearer and more useful information for the model to learn from.

D. They increase the size of the dataset automatically.

Solution

Step 1: Understand the role of features in machine learning
Features are the pieces of information the model uses to find patterns and make predictions.
Step 2: Recognize how engineered features improve clarity
Engineered features transform raw data into clearer, more meaningful forms that help the model learn better.
Final Answer:
They provide clearer and more useful information for the model to learn from. -> Option C
Quick Check:
Clear features = Better learning [OK]

Hint: Engineered features clarify data meaning for models [OK]

Common Mistakes:

Thinking engineered features speed up training by reducing layers
Believing engineered features increase dataset size automatically
Assuming engineered features remove need for training

2. Which of the following is the correct way to create a new feature called age_group from an age column in Python using pandas?

easy

A. df['age_group'] = df['age'].mean()

B. df['age_group'] = df['age'] > 30

C. df['age_group'] = df['age'].sum()

D. df['age_group'] = df['age'].apply(lambda x: 'young' if x < 30 else 'old')

Solution

Step 1: Identify how to create categorical features from numeric data
Using apply with a function lets us assign categories like 'young' or 'old' based on age.
Step 2: Check each option for correctness
df['age_group'] = df['age'].apply(lambda x: 'young' if x < 30 else 'old') uses apply with a lambda function to create age_group correctly. df['age_group'] = df['age'] > 30 creates a boolean, not a group. The sum and mean options compute sums or means, not groups.
Final Answer:
df['age_group'] = df['age'].apply(lambda x: 'young' if x < 30 else 'old') -> Option D
Quick Check:
Use apply + lambda for new categorical features [OK]

Hint: Use apply with lambda for conditional feature creation [OK]

Common Mistakes:

Using sum or mean instead of conditional logic
Creating boolean instead of categorical feature
Not using apply or map for transformation

3. Given this code snippet, what will be the output of print(df) after feature engineering?

import pandas as pd
df = pd.DataFrame({'temp_c': [0, 20, 30]})
df['temp_f'] = df['temp_c'] * 9/5 + 32
print(df)

medium

A. temp_c temp_f 0 0 32.0 1 20 68.0 2 30 86.0

B. temp_c temp_f 0 0 0.0 1 20 20.0 2 30 30.0

C. temp_c temp_f 0 0 32 1 20 68 2 30 86

D. Error: Cannot multiply series by float

Solution

Step 1: Understand the temperature conversion formula
Fahrenheit = Celsius * 9/5 + 32. The code applies this formula to each value in temp_c.
Step 2: Calculate the converted values
For 0°C: 0*9/5+32=32.0; for 20°C: 20*9/5+32=68.0; for 30°C: 30*9/5+32=86.0. The values are floats.
Final Answer:
temp_c temp_f 0 0 32.0 1 20 68.0 2 30 86.0 -> Option A
Quick Check:
Correct formula applied element-wise = temp_c temp_f 0 0 32.0 1 20 68.0 2 30 86.0 [OK]

Hint: Apply formulas element-wise for new numeric features [OK]

Common Mistakes:

Confusing Celsius and Fahrenheit formulas
Expecting integer instead of float results
Thinking pandas cannot multiply series by float

4. You wrote this code to create a new feature is_adult but it gives wrong results. What is the bug?

df['is_adult'] = df['age'] > '18'

medium

A. Comparing numeric age to string '18' causes incorrect results.

B. The operator > cannot be used in pandas.

C. The new feature should be named adult_flag instead.

D. You must use double equals == for comparison.

Solution

Step 1: Identify data type mismatch in comparison
The code compares numeric age values to a string '18', which leads to wrong boolean results.
Step 2: Correct the comparison by using a numeric value
Replace '18' (string) with 18 (integer) to compare numbers properly.
Final Answer:
Comparing numeric age to string '18' causes incorrect results. -> Option A
Quick Check:
Match data types in comparisons [OK]

Hint: Compare numbers to numbers, not strings [OK]

Common Mistakes:

Using string instead of numeric for comparison
Thinking > operator is invalid in pandas
Confusing == with > for this logic

5. You have a dataset with raw timestamps and want to improve your model predicting sales. Which engineered feature is most likely to help the model find useful patterns?

hard

A. Converting timestamps to strings without changes.

B. Extracting the hour of day and day of week from the timestamp.

C. Removing all timestamp data to reduce complexity.

D. Replacing timestamps with random numbers.

Solution

Step 1: Understand what useful information timestamps hold
Timestamps contain time details that can reveal patterns like busy hours or weekdays.
Step 2: Identify which feature extraction helps models
Extracting hour and day of week turns raw timestamps into meaningful features that models can use to detect trends.
Final Answer:
Extracting the hour of day and day of week from the timestamp. -> Option B
Quick Check:
Meaningful time features improve pattern detection [OK]

Hint: Turn raw timestamps into time parts like hour/day [OK]

Common Mistakes:

Keeping timestamps as strings without extraction
Removing timestamps losing useful info
Replacing timestamps with random data

Why engineered features improve models in ML Python - The Real Reasons

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of features in machine learning

Step 2: Recognize how engineered features improve clarity

Final Answer:

Quick Check:

Solution

Step 1: Identify how to create categorical features from numeric data

Step 2: Check each option for correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand the temperature conversion formula

Step 2: Calculate the converted values

Final Answer:

Quick Check:

Solution

Step 1: Identify data type mismatch in comparison

Step 2: Correct the comparison by using a numeric value

Final Answer:

Quick Check:

Solution

Step 1: Understand what useful information timestamps hold

Step 2: Identify which feature extraction helps models

Final Answer:

Quick Check: