Recall & Review

beginner

What is training data preparation in machine learning?

Training data preparation is the process of cleaning, organizing, and formatting raw data so that a machine learning model can learn from it effectively.

Click to reveal answer

beginner

Why do we need to clean data before training a model?

Cleaning data removes errors, missing values, and inconsistencies that could confuse the model and reduce its accuracy.

Click to reveal answer

intermediate

What is feature scaling and why is it important?

Feature scaling adjusts the range of data features so they have similar scales, helping the model learn faster and perform better.

Click to reveal answer

beginner

Explain the difference between training, validation, and test data.

Training data is used to teach the model. Validation data helps tune the model’s settings. Test data checks how well the model works on new, unseen data.

Click to reveal answer

intermediate

What is data augmentation and when is it used?

Data augmentation creates new training examples by modifying existing data, like flipping images. It is used to increase data size and improve model robustness.

Click to reveal answer

Which step is NOT part of training data preparation?

ACleaning missing values

BSplitting data into sets

CTraining the model

DScaling features

Why do we split data into training, validation, and test sets?

ATo evaluate model performance fairly

BTo remove errors from data

CTo make the dataset smaller

DTo speed up data cleaning

What does feature scaling do?

AAdds new data points

BRemoves missing data

CSplits data into groups

DChanges data to a similar range

Data augmentation is mainly used to:

ACreate more training examples

BClean data errors

CSplit data into sets

DScale features

Which of these is a common data cleaning task?

ANormalizing features

BRemoving duplicates

CSplitting data

DTraining the model

Describe the key steps involved in preparing training data for a machine learning model.

Explain why splitting data into training, validation, and test sets is important.

Practice

(1/5)

1. What is the main purpose of training data preparation in machine learning?

easy

A. To clean and organize data for better model learning

B. To create the final model architecture

C. To deploy the model to production

D. To write the code for model training

Training data preparation in Prompt Engineering / GenAI - Cheat Sheet & Quick Revision

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of training data preparation

Step 2: Differentiate from other steps in machine learning

Final Answer:

Quick Check:

Solution

Step 1: Recall the scikit-learn function for splitting data

Step 2: Check the syntax of each option

Final Answer:

Quick Check:

Solution

Step 1: Understand the data shape and split ratio

Step 2: Calculate the shapes of training and testing sets

Final Answer:

Quick Check:

Solution

Step 1: Check input data type compatibility

Step 2: Verify method usage

Final Answer:

Quick Check:

Solution

Step 1: Clean missing values first

Step 2: Encode categorical features before normalization

Step 3: Normalize numeric features and then split data

Final Answer:

Quick Check: