0
0
MLOpsdevops~15 mins

Why data versioning is harder than code versioning in MLOps - See It in Action

Choose your learning style9 modes available
Why Data Versioning is Harder Than Code Versioning
📖 Scenario: You are working as a machine learning engineer. You want to understand why managing different versions of data is more difficult than managing versions of code. This helps you plan better for your projects.
🎯 Goal: Build a simple Python program that shows a dataset and tracks changes to it step-by-step, illustrating why data versioning is harder than code versioning.
📋 What You'll Learn
Create a dictionary called dataset with exact entries for three data points
Create a variable called version to track the dataset version number
Update the dataset by changing one data point to simulate a new version
Print the dataset and version to show the current state
💡 Why This Matters
🌍 Real World
In real machine learning projects, datasets change often and can be very large. Tracking these changes carefully is important to reproduce results and debug models.
💼 Career
Understanding data versioning challenges helps you work better with data engineers and ML engineers, improving collaboration and project reliability.
Progress0 / 4 steps
1
Create the initial dataset
Create a dictionary called dataset with these exact entries: 'user1': 25, 'user2': 30, 'user3': 22
MLOps
Need a hint?

Use curly braces {} to create a dictionary with keys and values.

2
Add a version number variable
Create a variable called version and set it to 1 to track the dataset version
MLOps
Need a hint?

Just assign the number 1 to the variable version.

3
Update the dataset to simulate a new version
Change the value for 'user2' in dataset to 31 and increase version by 1
MLOps
Need a hint?

Use dataset['user2'] = 31 to update the value and version += 1 to increase the version.

4
Print the current dataset and version
Write two print statements: one to display dataset and one to display version
MLOps
Need a hint?

Use print(dataset) and print(version) to show the current data and version.