0
0
Prompt Engineering / GenAIml~20 mins

Training data preparation in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Training Data Preparation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why is data normalization important before training a model?

Imagine you have a dataset with features measured in very different units, like height in centimeters and income in dollars. Why should you normalize this data before training a machine learning model?

ANormalization increases the size of the dataset to improve model accuracy.
BNormalization removes missing values automatically from the dataset.
CNormalization ensures all features contribute equally by scaling them to a similar range, preventing bias towards features with larger values.
DNormalization converts categorical data into numerical labels.
Attempts:
2 left
💡 Hint

Think about how different scales can affect the learning process of a model.

Predict Output
intermediate
2:00remaining
Output of data splitting code

What is the output of the following Python code that splits data into training and testing sets?

Prompt Engineering / GenAI
from sklearn.model_selection import train_test_split
X = list(range(10))
y = list(range(10))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(len(X_train), len(X_test))
A3 7
B7 3
C10 0
D0 10
Attempts:
2 left
💡 Hint

Check how the test_size parameter affects the split.

Model Choice
advanced
2:00remaining
Best data preparation for text classification

You want to train a model to classify movie reviews as positive or negative. Which data preparation step is most important before training?

AConvert text to lowercase, remove punctuation, and tokenize words.
BNormalize numerical features to zero mean and unit variance.
CFill missing values with the mean of the column.
DEncode categorical variables using one-hot encoding.
Attempts:
2 left
💡 Hint

Think about what you do to raw text before feeding it to a model.

Metrics
advanced
2:00remaining
Evaluating data quality impact on model accuracy

You train two models on the same task. Model A uses raw data with many missing values. Model B uses data where missing values were properly handled. Which metric difference would best show the impact of data preparation?

AModel A has lower accuracy and lower loss than Model B.
BModel A has higher accuracy but higher loss than Model B.
CBoth models have the same accuracy but different loss values.
DModel B has higher accuracy and lower loss than Model A.
Attempts:
2 left
💡 Hint

Good data preparation usually improves both accuracy and loss.

🔧 Debug
expert
2:00remaining
Identify the error in data preprocessing code

What error will this Python code raise when preparing data for training?

Prompt Engineering / GenAI
import numpy as np
from sklearn.preprocessing import StandardScaler
X = np.array([[1, 2], [3, 4], [5, 6]])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print(X_scaled[3])
AIndexError: index 3 is out of bounds for axis 0 with size 3
BTypeError: 'StandardScaler' object is not callable
CValueError: could not convert string to float
DNo error, prints scaled values
Attempts:
2 left
💡 Hint

Check the size of the array and the index accessed.