Bird
Raised Fist0
MLOpsdevops~5 mins

Hardware and framework version tracking in MLOps - Commands & Configuration

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
When you train machine learning models, it is important to know exactly what hardware and software versions were used. This helps you reproduce results and fix problems later. Tracking hardware and framework versions automatically saves time and avoids confusion.
When you want to record the GPU model and driver version used for training a model.
When you need to log the exact version of TensorFlow or PyTorch your code ran on.
When you want to compare model performance across different hardware setups.
When you need to share your model training environment details with teammates.
When you want to ensure your model can be reproduced months later with the same setup.
Commands
Install MLflow, a tool that helps track machine learning experiments including hardware and framework versions.
Terminal
pip install mlflow
Expected OutputExpected
Collecting mlflow Downloading mlflow-2.4.1-py3-none-any.whl (18.7 MB) Installing collected packages: mlflow Successfully installed mlflow-2.4.1
Run a Python script that logs hardware and framework versions to MLflow for tracking.
Terminal
python track_versions.py
Expected OutputExpected
2024/06/01 12:00:00 INFO mlflow.tracking.fluent: Experiment with name 'HardwareFrameworkTracking' does not exist. Creating a new experiment. 2024/06/01 12:00:00 INFO mlflow.tracking.fluent: Experiment created with ID '1' 2024/06/01 12:00:00 INFO mlflow.tracking.fluent: Run started with run_id '1234567890abcdef' Logged parameter: gpu_name = NVIDIA GeForce RTX 3080 Logged parameter: cuda_version = 11.8 Logged parameter: pytorch_version = 2.0.1 Logged parameter: python_version = 3.11.4 Run ended with status FINISHED
Key Concept

If you remember nothing else from this pattern, remember: automatically logging hardware and framework versions ensures your ML experiments are reproducible and understandable later.

Code Example
MLOps
import mlflow
import platform
import torch
import subprocess

# Function to get GPU name

def get_gpu_name():
    try:
        result = subprocess.run(['nvidia-smi', '--query-gpu=name', '--format=csv,noheader'],
                                stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, check=True)
        return result.stdout.strip()
    except Exception:
        return 'No GPU detected'

# Function to get CUDA version

def get_cuda_version():
    try:
        result = subprocess.run(['nvcc', '--version'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, check=True)
        for line in result.stdout.splitlines():
            if 'release' in line:
                return line.split('release')[-1].strip().split(',')[0]
        return 'Unknown CUDA version'
    except Exception:
        return 'CUDA not installed'

mlflow.set_experiment('HardwareFrameworkTracking')
with mlflow.start_run():
    gpu_name = get_gpu_name()
    cuda_version = get_cuda_version()
    pytorch_version = torch.__version__
    python_version = platform.python_version()

    mlflow.log_param('gpu_name', gpu_name)
    mlflow.log_param('cuda_version', cuda_version)
    mlflow.log_param('pytorch_version', pytorch_version)
    mlflow.log_param('python_version', python_version)

    print('Logged hardware and framework versions to MLflow')
OutputSuccess
Common Mistakes
Not logging hardware or framework versions at all.
Without this info, you cannot reproduce or debug your model training environment.
Always include code to log hardware and framework versions as parameters in your experiment tracking.
Manually writing hardware info in notes instead of automating it.
Manual notes can be forgotten or inaccurate, causing confusion.
Use code to detect and log hardware and software versions automatically.
Summary
Install MLflow to track machine learning experiments.
Write a Python script to automatically detect and log hardware and framework versions.
Run the script to save this info in MLflow for reproducibility and debugging.

Practice

(1/5)
1. Why is it important to track hardware and framework versions in MLOps?
easy
A. To reduce the size of the model files
B. To make the code run faster on any machine
C. To ensure experiments can be reproduced exactly later
D. To avoid using any cloud services

Solution

  1. Step 1: Understand reproducibility in experiments

    Reproducibility means you can get the same results again by using the same setup.
  2. Step 2: Connect version tracking to reproducibility

    Tracking hardware and framework versions helps recreate the exact environment for experiments.
  3. Final Answer:

    To ensure experiments can be reproduced exactly later -> Option C
  4. Quick Check:

    Reproducibility = Track versions [OK]
Hint: Reproducibility needs exact version info [OK]
Common Mistakes:
  • Thinking tracking speeds up code
  • Confusing version tracking with file size
  • Assuming cloud use is related
2. Which of the following is the correct way to store framework version in a Python dictionary for tracking?
easy
A. versions = {"tensorflow": "2.12.0"}
B. versions = (tensorflow: 2.12.0)
C. versions = [tensorflow = "2.12.0"]
D. versions = {tensorflow => "2.12.0"}

Solution

  1. Step 1: Recall Python dictionary syntax

    Python dictionaries use curly braces with key: value pairs, keys and values as strings need quotes.
  2. Step 2: Check each option's syntax

    versions = {"tensorflow": "2.12.0"} uses correct syntax with quotes and colon. Others use invalid syntax for Python dictionaries.
  3. Final Answer:

    versions = {"tensorflow": "2.12.0"} -> Option A
  4. Quick Check:

    Python dict = {key: value} [OK]
Hint: Python dict uses {"key": "value"} syntax [OK]
Common Mistakes:
  • Using parentheses instead of braces
  • Using equal sign inside list
  • Using => instead of : in dict
3. Given this Python code snippet for tracking versions:
versions = {"tensorflow": "2.12.0", "cuda": "11.8"}
print(versions.get("cuda"))

What is the output?
medium
A. "11.8"
B. 11.8
C. cuda
D. None

Solution

  1. Step 1: Understand the dictionary and get method

    The dictionary stores strings as values. The get method returns the value for the key "cuda".
  2. Step 2: Identify the value for key "cuda"

    The value is the string "11.8". Printing it outputs 11.8 with quotes because it's a string.
  3. Final Answer:

    "11.8" -> Option A
  4. Quick Check:

    versions.get("cuda") = "11.8" [OK]
Hint: dict.get(key) returns string value with quotes in output [OK]
Common Mistakes:
  • Confusing printed string with quotes included
  • Expecting key name as output
  • Thinking get returns None if key exists
4. You wrote this code to update hardware version:
hardware_versions = {"GPU": "NVIDIA RTX 3090"}
hardware_versions["GPU"] = NVIDIA RTX 4090
print(hardware_versions)

What error will occur?
medium
A. No error, prints updated dictionary
B. NameError because NVIDIA RTX 4090 is not quoted
C. SyntaxError due to invalid dictionary
D. KeyError because GPU key is missing

Solution

  1. Step 1: Check the assignment line syntax

    The value NVIDIA RTX 4090 is not in quotes, so Python treats it as variable names.
  2. Step 2: Understand Python error for undefined names

    Since no variable named NVIDIA exists, Python raises a NameError.
  3. Final Answer:

    NameError because NVIDIA RTX 4090 is not quoted -> Option B
  4. Quick Check:

    Unquoted strings cause NameError [OK]
Hint: Always quote string values in Python [OK]
Common Mistakes:
  • Thinking KeyError occurs for existing keys
  • Assuming syntax error instead of NameError
  • Believing code runs without error
5. You want to track both hardware and framework versions in one dictionary. Which code correctly updates the framework version without losing hardware info?
versions = {"hardware": {"GPU": "NVIDIA RTX 3090"}, "framework": {"tensorflow": "2.11.0", "torch": "1.13.0"}}
# Update tensorflow to 2.12.0 here
hard
A. versions.update({"tensorflow": "2.12.0"})
B. versions["framework"] = {"tensorflow": "2.12.0"}
C. versions["tensorflow"] = "2.12.0"
D. versions["framework"]["tensorflow"] = "2.12.0"

Solution

  1. Step 1: Understand nested dictionary structure

    "framework" key holds a dictionary with tensorflow version inside.
  2. Step 2: Update tensorflow version inside nested dictionary

    Use versions["framework"]["tensorflow"] = "2.12.0" to update without overwriting hardware info.
  3. Step 3: Check other options for overwriting risk

    versions["framework"] = {"tensorflow": "2.12.0"} replaces entire framework dict, versions["tensorflow"] = "2.12.0" and D add keys at top level, losing structure.
  4. Final Answer:

    versions["framework"]["tensorflow"] = "2.12.0" -> Option D
  5. Quick Check:

    Update nested dict key correctly [OK]
Hint: Update nested dict keys to keep all info [OK]
Common Mistakes:
  • Replacing whole nested dict by mistake
  • Adding keys at wrong dictionary level
  • Using update() incorrectly on nested keys