0
0
MLOpsdevops~30 mins

Training data pipeline automation in MLOps - Mini Project: Build & Apply

Choose your learning style9 modes available
Training Data Pipeline Automation
📖 Scenario: You are working as a machine learning engineer. You need to automate the process of preparing training data for your ML model. This involves collecting raw data, filtering it based on quality, and then outputting the cleaned data ready for training.
🎯 Goal: Build a simple Python script that automates a training data pipeline. The script will start with raw data, apply a quality filter, and then output the cleaned data.
📋 What You'll Learn
Create a dictionary with raw data samples and their quality scores
Add a quality threshold variable to filter data
Use a dictionary comprehension to select only data samples above the threshold
Print the filtered data dictionary
💡 Why This Matters
🌍 Real World
Automating data preparation saves time and reduces errors in machine learning projects by ensuring only good quality data is used for training.
💼 Career
Data engineers and ML engineers often build automated pipelines like this to prepare data efficiently and reliably for model training.
Progress0 / 4 steps
1
Create raw data dictionary
Create a dictionary called raw_data with these exact entries: 'sample1': 0.85, 'sample2': 0.45, 'sample3': 0.95, 'sample4': 0.30, 'sample5': 0.75 representing data sample names and their quality scores.
MLOps
Need a hint?

Use curly braces to create a dictionary. Each entry has a sample name as a string key and a float value for quality.

2
Set quality threshold
Create a variable called quality_threshold and set it to 0.7 to filter out low-quality data samples.
MLOps
Need a hint?

Just assign the number 0.7 to the variable named quality_threshold.

3
Filter data using dictionary comprehension
Create a new dictionary called filtered_data using dictionary comprehension. Include only those entries from raw_data where the quality score is greater than or equal to quality_threshold. Use sample and score as the loop variables.
MLOps
Need a hint?

Use dictionary comprehension syntax: {key: value for key, value in dict.items() if condition}.

4
Print filtered data
Write a print statement to display the filtered_data dictionary.
MLOps
Need a hint?

Use print(filtered_data) to show the filtered dictionary.