How to Use Docker for Machine Learning Projects
Use
Docker to create isolated containers that package your ML code, dependencies, and environment together. Write a Dockerfile to specify your setup, build an image with docker build, and run your ML tasks inside containers using docker run.Syntax
A typical Docker workflow for ML involves these steps:
- Dockerfile: A text file defining the environment setup.
- docker build: Command to create an image from the Dockerfile.
- docker run: Command to start a container from the image and run your ML code.
This keeps your ML environment consistent and portable.
Dockerfile
FROM python:3.9-slim WORKDIR /app COPY requirements.txt ./ RUN pip install --no-cache-dir -r requirements.txt COPY . ./ CMD ["python", "train.py"]
Example
This example shows how to containerize a simple ML training script using Docker.
bash
# Dockerfile FROM python:3.9-slim WORKDIR /app COPY requirements.txt ./ RUN pip install --no-cache-dir -r requirements.txt COPY . ./ CMD ["python", "train.py"] # requirements.txt scikit-learn==1.2.2 # train.py from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42) model = RandomForestClassifier() model.fit(X_train, y_train) preds = model.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, preds):.2f}")
Output
Accuracy: 1.00
Common Pitfalls
Common mistakes when using Docker for ML include:
- Not specifying exact package versions in
requirements.txt, causing inconsistent environments. - Copying unnecessary files into the image, making it large and slow.
- Forgetting to set the working directory, leading to file not found errors.
- Not exposing ports or volumes when needed for data sharing or model serving.
Always test your container locally before deployment.
Dockerfile
## Wrong Dockerfile snippet (no working directory) FROM python:3.9-slim COPY requirements.txt ./ RUN pip install -r requirements.txt COPY . ./ CMD ["python", "train.py"] ## Correct Dockerfile snippet FROM python:3.9-slim WORKDIR /app COPY requirements.txt ./ RUN pip install -r requirements.txt COPY . ./ CMD ["python", "train.py"]
Quick Reference
Here is a quick cheat-sheet for Docker commands useful in ML projects:
| Command | Description |
|---|---|
| docker build -t my-ml-image . | Builds a Docker image named 'my-ml-image' from the current directory |
| docker run --rm my-ml-image | Runs the container and removes it after exit |
| docker run -v $(pwd)/data:/app/data my-ml-image | Mounts local 'data' folder inside container |
| docker ps | Lists running containers |
| docker stop | Stops a running container |
Key Takeaways
Write a Dockerfile to define your ML environment and dependencies clearly.
Use docker build and docker run commands to create and run containers for ML tasks.
Pin package versions in requirements.txt to ensure reproducible environments.
Keep your Docker images small by copying only necessary files.
Test containers locally before deploying ML models or pipelines.