Prompt Engineering / GenAIml~20 mins

Training data preparation in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Training Data Preparation Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why is data normalization important before training a model?

Imagine you have a dataset with features measured in very different units, like height in centimeters and income in dollars. Why should you normalize this data before training a machine learning model?

ANormalization increases the size of the dataset to improve model accuracy.

BNormalization removes missing values automatically from the dataset.

CNormalization ensures all features contribute equally by scaling them to a similar range, preventing bias towards features with larger values.

DNormalization converts categorical data into numerical labels.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of data splitting code

What is the output of the following Python code that splits data into training and testing sets?

Prompt Engineering / GenAI

from sklearn.model_selection import train_test_split
X = list(range(10))
y = list(range(10))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(len(X_train), len(X_test))

A3 7

B7 3

C10 0

D0 10

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Best data preparation for text classification

You want to train a model to classify movie reviews as positive or negative. Which data preparation step is most important before training?

AConvert text to lowercase, remove punctuation, and tokenize words.

BNormalize numerical features to zero mean and unit variance.

CFill missing values with the mean of the column.

DEncode categorical variables using one-hot encoding.

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Evaluating data quality impact on model accuracy

You train two models on the same task. Model A uses raw data with many missing values. Model B uses data where missing values were properly handled. Which metric difference would best show the impact of data preparation?

AModel A has lower accuracy and lower loss than Model B.

BModel A has higher accuracy but higher loss than Model B.

CBoth models have the same accuracy but different loss values.

DModel B has higher accuracy and lower loss than Model A.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identify the error in data preprocessing code

What error will this Python code raise when preparing data for training?

Prompt Engineering / GenAI

import numpy as np
from sklearn.preprocessing import StandardScaler
X = np.array([[1, 2], [3, 4], [5, 6]])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print(X_scaled[3])

AIndexError: index 3 is out of bounds for axis 0 with size 3

BTypeError: 'StandardScaler' object is not callable

CValueError: could not convert string to float

DNo error, prints scaled values

Attempts:

2 left

Practice

(1/5)

1. What is the main purpose of training data preparation in machine learning?

easy

A. To clean and organize data for better model learning

B. To create the final model architecture

C. To deploy the model to production

D. To write the code for model training

Training data preparation in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of training data preparation

Step 2: Differentiate from other steps in machine learning

Final Answer:

Quick Check:

Solution

Step 1: Recall the scikit-learn function for splitting data

Step 2: Check the syntax of each option

Final Answer:

Quick Check:

Solution

Step 1: Understand the data shape and split ratio

Step 2: Calculate the shapes of training and testing sets

Final Answer:

Quick Check:

Solution

Step 1: Check input data type compatibility

Step 2: Verify method usage

Final Answer:

Quick Check:

Solution

Step 1: Clean missing values first

Step 2: Encode categorical features before normalization

Step 3: Normalize numeric features and then split data

Final Answer:

Quick Check: