How to use Docker for ML

Ml-pythonHow-ToBeginner · 4 min read

How to Use Docker for Machine Learning Projects

Use Docker to create isolated containers that package your ML code, dependencies, and environment together. Write a Dockerfile to specify your setup, build an image with docker build, and run your ML tasks inside containers using docker run.

📐

Syntax

A typical Docker workflow for ML involves these steps:

Dockerfile: A text file defining the environment setup.
docker build: Command to create an image from the Dockerfile.
docker run: Command to start a container from the image and run your ML code.

This keeps your ML environment consistent and portable.

Dockerfile

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . ./
CMD ["python", "train.py"]

💻

Example

This example shows how to containerize a simple ML training script using Docker.

bash

# Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . ./
CMD ["python", "train.py"]

# requirements.txt
scikit-learn==1.2.2

# train.py
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
model = RandomForestClassifier()
model.fit(X_train, y_train)
preds = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, preds):.2f}")

Output

Accuracy: 1.00

⚠️

Common Pitfalls

Common mistakes when using Docker for ML include:

Not specifying exact package versions in requirements.txt, causing inconsistent environments.
Copying unnecessary files into the image, making it large and slow.
Forgetting to set the working directory, leading to file not found errors.
Not exposing ports or volumes when needed for data sharing or model serving.

Always test your container locally before deployment.

Dockerfile

## Wrong Dockerfile snippet (no working directory)
FROM python:3.9-slim
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . ./
CMD ["python", "train.py"]

## Correct Dockerfile snippet
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . ./
CMD ["python", "train.py"]

📊

Quick Reference

Here is a quick cheat-sheet for Docker commands useful in ML projects:

Command	Description
docker build -t my-ml-image .	Builds a Docker image named 'my-ml-image' from the current directory
docker run --rm my-ml-image	Runs the container and removes it after exit
docker run -v $(pwd)/data:/app/data my-ml-image	Mounts local 'data' folder inside container
docker ps	Lists running containers
docker stop	Stops a running container

✅

Key Takeaways

Write a Dockerfile to define your ML environment and dependencies clearly.

Use docker build and docker run commands to create and run containers for ML tasks.

Pin package versions in requirements.txt to ensure reproducible environments.

Keep your Docker images small by copying only necessary files.

Test containers locally before deploying ML models or pipelines.