Bird
Raised Fist0
MLOpsdevops~7 mins

Regulatory compliance (GDPR, AI Act) in MLOps - Commands & Configuration

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Regulatory compliance means following laws that protect people's data and privacy. For AI and machine learning, this means making sure models and data handling meet rules like GDPR and the AI Act. This helps avoid legal trouble and builds trust with users.
When you collect personal data for training machine learning models and need to protect user privacy.
When deploying AI models in Europe where GDPR and AI Act rules apply.
When you want to document how your AI model uses data to show compliance during audits.
When you need to control who can access sensitive data and AI model outputs.
When you want to automate checks that your AI system respects data protection laws.
Config File - mlflow_tracking_config.py
mlflow_tracking_config.py
import mlflow
import os

# Set tracking URI to a secure server with access control
mlflow.set_tracking_uri("https://mlflow.example.com")

# Enable artifact encryption and access logging
os.environ["MLFLOW_ARTIFACT_ENCRYPTION"] = "true"
os.environ["MLFLOW_ACCESS_LOGGING"] = "true"

# Define tags for compliance tracking
mlflow.set_tag("compliance", "GDPR")
mlflow.set_tag("data_privacy", "enabled")

# Function to log model with data usage info

def log_model_with_compliance(model, data_description):
    mlflow.log_param("data_description", data_description)
    mlflow.sklearn.log_model(model, "model")
    print("Model logged with compliance tags and data description")

This Python config sets up MLflow tracking to comply with GDPR and AI Act rules.

  • Tracking URI: Points to a secure MLflow server with access control.
  • Artifact encryption: Ensures stored model files are encrypted.
  • Access logging: Records who accesses model data for audits.
  • Tags: Labels runs with compliance info for easy filtering.
  • Logging function: Logs model and describes data used, helping trace data lineage.
Commands
Run the Python script to configure MLflow tracking with compliance settings and log a model with data usage description.
Terminal
python mlflow_tracking_config.py
Expected OutputExpected
Model logged with compliance tags and data description
Start the MLflow UI to visually inspect logged models, parameters, and compliance tags.
Terminal
mlflow ui
Expected OutputExpected
2024/06/01 12:00:00 Starting MLflow UI at http://127.0.0.1:5000
--host - Bind the UI to a specific IP address for secure access
--port - Specify the port number for the UI
Query the MLflow server API to find all runs tagged with GDPR compliance for audit purposes.
Terminal
curl -X GET https://mlflow.example.com/api/2.0/mlflow/runs/search -d '{"filter": "tags.compliance = \"GDPR\""}'
Expected OutputExpected
{"runs": [{"run_id": "1234abcd", "tags": {"compliance": "GDPR", "data_privacy": "enabled"}}]}
Key Concept

If you remember nothing else from this pattern, remember: always track and document data usage and model metadata to prove compliance with data protection laws.

Code Example
MLOps
import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load sample data
iris = load_iris()
X, y = iris.data, iris.target

# Train a simple model
model = LogisticRegression(max_iter=100)
model.fit(X, y)

# Set tracking URI and tags
mlflow.set_tracking_uri("https://mlflow.example.com")
mlflow.set_tag("compliance", "GDPR")
mlflow.set_tag("data_privacy", "enabled")

# Log model with data description
with mlflow.start_run():
    mlflow.log_param("data_description", "Iris dataset with no personal data")
    mlflow.sklearn.log_model(model, "model")
    print("Model logged with compliance tags and data description")
OutputSuccess
Common Mistakes
Not tagging MLflow runs with compliance-related metadata.
Without tags, it's hard to filter and prove which models follow regulations during audits.
Always add clear tags like 'compliance' and 'data_privacy' when logging models.
Storing model artifacts without encryption or access logs.
This risks unauthorized access to sensitive data and violates GDPR requirements.
Enable artifact encryption and access logging in your MLflow server configuration.
Not describing the data used for training in the model logs.
Auditors need to see what data was used to ensure it complies with consent and privacy rules.
Log parameters or tags that describe the data source and privacy status.
Summary
Configure MLflow tracking to use secure servers with encryption and access logs.
Tag model runs with compliance metadata to easily find and audit them later.
Log detailed data descriptions alongside models to prove lawful data use.

Practice

(1/5)
1. What is the main purpose of GDPR in the context of MLOps?
easy
A. To improve the speed of machine learning model training
B. To protect user data privacy and control how personal data is used
C. To increase the accuracy of AI predictions
D. To reduce the cost of cloud computing resources

Solution

  1. Step 1: Understand GDPR's focus

    GDPR is a law designed to protect personal data and privacy of individuals in the EU.
  2. Step 2: Relate GDPR to MLOps

    In MLOps, GDPR ensures that data used for training and deployment respects user privacy and consent.
  3. Final Answer:

    To protect user data privacy and control how personal data is used -> Option B
  4. Quick Check:

    GDPR = Protect user privacy [OK]
Hint: GDPR is about data privacy and user rights [OK]
Common Mistakes:
  • Confusing GDPR with performance improvements
  • Thinking GDPR controls AI accuracy
  • Assuming GDPR reduces costs
2. Which of the following is the correct way to document AI model compliance with the AI Act?
easy
A. Document only the training code without data details
B. Only save the final model weights without any metadata
C. Avoid documenting to protect intellectual property
D. Keep a detailed record of data sources, model decisions, and risk assessments

Solution

  1. Step 1: Understand AI Act documentation requirements

    The AI Act requires transparency, including data sources, model behavior, and risk management.
  2. Step 2: Identify correct documentation practice

    Keeping detailed records ensures compliance and accountability for AI systems.
  3. Final Answer:

    Keep a detailed record of data sources, model decisions, and risk assessments -> Option D
  4. Quick Check:

    AI Act = Detailed compliance records [OK]
Hint: Document all data and risks for AI Act compliance [OK]
Common Mistakes:
  • Ignoring data source documentation
  • Saving only model weights without context
  • Not assessing risks or model decisions
3. Consider this Python snippet used in an MLOps pipeline to check GDPR compliance:
def check_data_compliance(data):
    if 'user_consent' in data and data['user_consent'] == True:
        return 'Compliant'
    else:
        return 'Non-compliant'

result = check_data_compliance({'user_consent': False})
print(result)
What will be the output?
medium
A. Compliant
B. True
C. Non-compliant
D. KeyError

Solution

  1. Step 1: Analyze the function logic

    The function checks if 'user_consent' key exists and is True; otherwise returns 'Non-compliant'.
  2. Step 2: Evaluate the input data

    The input has 'user_consent' set to False, so condition fails and returns 'Non-compliant'.
  3. Final Answer:

    Non-compliant -> Option C
  4. Quick Check:

    Consent False means Non-compliant [OK]
Hint: Check boolean condition carefully for True/False [OK]
Common Mistakes:
  • Assuming any 'user_consent' key means compliant
  • Expecting a KeyError when key exists
  • Confusing output with boolean True
4. You have this snippet to check AI Act compliance but it raises an error:
def validate_model_risk(risk_level):
    if risk_level = 'high':
        return 'Requires strict controls'
    else:
        return 'Standard controls'
What is the error and how to fix it?
medium
A. SyntaxError due to '=' instead of '==' in if condition; fix by using '=='
B. NameError because risk_level is undefined; fix by defining risk_level
C. IndentationError due to missing indent; fix by indenting return lines
D. TypeError because risk_level is not a string; fix by converting to string

Solution

  1. Step 1: Identify the error in the if statement

    The if condition uses '=' which is assignment, not comparison, causing SyntaxError.
  2. Step 2: Correct the comparison operator

    Replace '=' with '==' to compare risk_level to 'high' properly.
  3. Final Answer:

    SyntaxError due to '=' instead of '==' in if condition; fix by using '==' -> Option A
  4. Quick Check:

    Use '==' for comparison, not '=' [OK]
Hint: Use '==' for comparisons, '=' is assignment [OK]
Common Mistakes:
  • Using '=' instead of '==' in conditions
  • Confusing SyntaxError with NameError
  • Ignoring indentation correctness
5. You want to automate GDPR compliance checks in your MLOps pipeline. Which approach best ensures compliance before model deployment?
hard
A. Integrate automated data scanning tools to detect personal data and verify consent flags
B. Deploy models immediately and fix compliance issues if users complain
C. Skip data checks and rely on manual audits after deployment
D. Only check compliance for models trained outside the EU

Solution

  1. Step 1: Understand GDPR compliance automation

    Automated tools can scan data to detect personal information and check if user consent is present.
  2. Step 2: Evaluate deployment strategies

    Deploying without checks or relying on manual audits risks legal issues and user trust loss.
  3. Step 3: Choose best proactive approach

    Integrating automated compliance checks before deployment ensures issues are caught early and fixed.
  4. Final Answer:

    Integrate automated data scanning tools to detect personal data and verify consent flags -> Option A
  5. Quick Check:

    Automate compliance checks before deployment [OK]
Hint: Automate data and consent checks pre-deployment [OK]
Common Mistakes:
  • Ignoring compliance until after deployment
  • Relying only on manual audits
  • Assuming non-EU models don't need checks