0
0
MLOpsdevops~15 mins

Multi-stage builds for smaller images in MLOps - Deep Dive

Choose your learning style9 modes available
Overview - Multi-stage builds for smaller images
What is it?
Multi-stage builds are a way to create smaller, efficient container images by using multiple steps in one build process. Each step can use a different base image and only the necessary parts are copied to the final image. This helps keep the final container lightweight and fast to deploy. It is especially useful in machine learning operations where images can get large due to dependencies.
Why it matters
Without multi-stage builds, container images often include unnecessary files and tools, making them large and slow to transfer or start. This wastes storage and network resources and slows down deployment. Multi-stage builds solve this by separating build-time and runtime environments, reducing image size and improving performance. This means faster updates, less cost, and more reliable deployments.
Where it fits
Before learning multi-stage builds, you should understand basic Docker images and Dockerfile syntax. After mastering multi-stage builds, you can explore advanced image optimization, security scanning, and automated CI/CD pipelines that build and deploy these images efficiently.
Mental Model
Core Idea
Multi-stage builds let you use multiple temporary containers to build your app, then copy only the final needed parts into a small, clean image.
Think of it like...
It's like cooking a meal in several pots but only serving the finished dish on the plate, leaving all the dirty pots behind in the kitchen.
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Build Stage 1 │──────▶│ Build Stage 2 │──────▶│ Final Image   │
│ (compile code)│       │ (prepare app) │       │ (runtime only)│
└───────────────┘       └───────────────┘       └───────────────┘
          │                      │                      ▲
          │                      │                      │
          └─────────────copy─────┴──────────────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding basic Docker images
🤔
Concept: Learn what a Docker image is and how it contains everything needed to run an app.
A Docker image is like a snapshot of a computer environment. It includes the app code, system libraries, and tools. You write a Dockerfile to describe how to build this image step-by-step. When you run the image, it creates a container that runs your app.
Result
You can create and run a simple Docker image with your app inside.
Understanding images as self-contained environments helps you see why controlling their size and contents matters.
2
FoundationWriting a simple Dockerfile
🤔
Concept: Learn the basic commands to create a Dockerfile that builds an image.
A Dockerfile starts with a base image, then adds files and runs commands. For example: FROM python:3.9-slim COPY app.py /app/ CMD ["python", "/app/app.py"] This builds an image that runs a Python app.
Result
You can build and run a Docker image from your Dockerfile.
Knowing how to write a Dockerfile is the foundation for using multi-stage builds.
3
IntermediateIntroducing multi-stage build syntax
🤔Before reading on: do you think a Dockerfile can have multiple FROM lines? Commit to yes or no.
Concept: Learn that Dockerfiles can have multiple FROM commands to create stages.
Multi-stage builds use multiple FROM lines, each starting a new stage. For example: FROM golang:1.20 AS builder RUN go build -o app . FROM alpine:3.18 COPY --from=builder /app /app CMD ["/app"] The first stage builds the app, the second creates a small image with only the app binary.
Result
You create a smaller final image by copying only needed files from the build stage.
Knowing that each FROM starts a new stage lets you separate build and runtime environments cleanly.
4
IntermediateCopying artifacts between stages
🤔Before reading on: do you think you can copy files from one stage to another using COPY? Commit to yes or no.
Concept: Learn how to copy files from one build stage to another using --from flag.
The COPY command can use --from=stage_name to copy files from a previous stage. For example: COPY --from=builder /app /app This copies the built app from the builder stage into the final image.
Result
You can selectively include only the files you want in the final image.
Understanding this selective copying is key to reducing image size and keeping runtime clean.
5
IntermediateReducing image size with multi-stage builds
🤔
Concept: Learn how multi-stage builds help remove build tools and dependencies from the final image.
Build tools like compilers are needed only during build, not runtime. By using a build stage with all tools, then copying only the final app to a minimal base image, you avoid including bulky tools in the final image. This can reduce image size by hundreds of megabytes.
Result
Final images are smaller, faster to download, and more secure.
Knowing that build and runtime environments differ helps you optimize images for production.
6
AdvancedUsing multi-stage builds in ML pipelines
🤔Before reading on: do you think multi-stage builds can help reduce ML container sizes by excluding training dependencies? Commit to yes or no.
Concept: Learn how multi-stage builds separate heavy ML training dependencies from lightweight inference images.
ML projects often need large libraries for training but only a few for inference. Use a build stage with all training tools to prepare models, then copy only the model and minimal runtime dependencies to the final image. This keeps inference containers small and efficient.
Result
ML deployment images are smaller and faster, improving scalability and cost.
Understanding this separation improves ML deployment workflows and resource use.
7
ExpertAdvanced tricks and pitfalls in multi-stage builds
🤔Before reading on: do you think adding many stages always reduces image size? Commit to yes or no.
Concept: Learn subtle behaviors like caching, layer ordering, and when multi-stage builds might not reduce size.
Docker caches layers by stage, so changing early stages can invalidate cache for later ones. Also, copying large files multiple times can increase size. Sometimes, multi-stage builds add complexity without size benefit if not designed carefully. Using .dockerignore and minimizing copied files is essential.
Result
You build efficient images and avoid common mistakes that increase build time or image size.
Knowing Docker's caching and layering helps you write smarter multi-stage Dockerfiles.
Under the Hood
Docker builds images in layers. Each command in a Dockerfile creates a new layer. Multi-stage builds create separate intermediate images for each stage. Only the final stage is saved as the output image. COPY --from copies files from intermediate images without including their layers. This avoids carrying build tools and files into the final image, reducing size.
Why designed this way?
Originally, Docker images included all build tools, making them large. Multi-stage builds were introduced to separate build and runtime environments in one Dockerfile, simplifying workflows and reducing image size. Alternatives like separate Dockerfiles or manual cleanup were error-prone and complex.
┌───────────────┐    build    ┌───────────────┐
│ Stage 1: Build│───────────▶│ Intermediate   │
│ (with tools)  │            │ image layers   │
└───────────────┘            └───────────────┘
         │                          │
         │ copy files               │ copy only needed files
         ▼                          ▼
┌───────────────┐            ┌───────────────┐
│ Stage 2: Final│            │ Final Image   │
│ (minimal base)│            │ (small size)  │
└───────────────┘            └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does multi-stage build always make your image smaller? Commit yes or no.
Common Belief:Multi-stage builds always reduce image size significantly.
Tap to reveal reality
Reality:If not designed carefully, multi-stage builds can produce images as large or larger due to copying unnecessary files or poor caching.
Why it matters:Assuming size always shrinks can lead to bloated images and longer build times.
Quick: Can you copy files from any stage in any order? Commit yes or no.
Common Belief:You can copy files from any stage regardless of order or naming.
Tap to reveal reality
Reality:You can only copy from stages defined before the current stage and must use correct stage names or indexes.
Why it matters:Misusing COPY --from causes build errors and confusion.
Quick: Does multi-stage build remove all build tools from the final image automatically? Commit yes or no.
Common Belief:Multi-stage builds automatically remove build tools from the final image.
Tap to reveal reality
Reality:You must explicitly copy only needed files; build tools remain if copied or installed in the final stage.
Why it matters:Assuming automatic cleanup can cause large images and security risks.
Quick: Is multi-stage build a new Docker feature? Commit yes or no.
Common Belief:Multi-stage builds are a legacy Docker feature.
Tap to reveal reality
Reality:Multi-stage builds were introduced in Docker 17.05 and are a modern best practice.
Why it matters:Using older methods misses out on efficiency and simplicity.
Expert Zone
1
Using named stages improves readability and maintainability of complex Dockerfiles.
2
Layer caching behavior differs between stages; ordering commands to maximize cache reuse speeds up builds.
3
Combining multi-stage builds with .dockerignore files further reduces image size by excluding unnecessary files early.
When NOT to use
Avoid multi-stage builds when your app is extremely simple or when build and runtime environments must be identical for debugging. In such cases, single-stage builds or specialized build tools like Bazel may be better.
Production Patterns
In production, multi-stage builds are combined with CI/CD pipelines that build, test, and push optimized images automatically. Common patterns include separate build stages for compiling code, running tests, and packaging minimal runtime images with only necessary binaries and config.
Connections
Continuous Integration/Continuous Deployment (CI/CD)
Multi-stage builds are often integrated into CI/CD pipelines to automate efficient image creation and deployment.
Understanding multi-stage builds helps optimize CI/CD workflows by producing smaller, faster images that speed up delivery.
Software Build Systems
Multi-stage builds mirror concepts in build systems that separate compilation and packaging steps.
Knowing build systems clarifies why separating build and runtime environments reduces complexity and size.
Manufacturing Assembly Lines
Multi-stage builds resemble assembly lines where components are built separately then assembled into a final product.
Seeing multi-stage builds as assembly lines helps grasp the efficiency gained by modular, staged construction.
Common Pitfalls
#1Including build tools in the final image by installing them in the last stage.
Wrong approach:FROM python:3.9 RUN apt-get update && apt-get install -y build-essential COPY . /app RUN make /app CMD ["python", "/app/app.py"]
Correct approach:FROM python:3.9 AS builder RUN apt-get update && apt-get install -y build-essential COPY . /app RUN make /app FROM python:3.9-slim COPY --from=builder /app /app CMD ["python", "/app/app.py"]
Root cause:Not separating build and runtime stages causes build tools to remain in the final image.
#2Copying entire build directory instead of only needed files.
Wrong approach:COPY --from=builder /app /app
Correct approach:COPY --from=builder /app/bin/app /app/bin/app COPY --from=builder /app/config.yaml /app/config.yaml
Root cause:Copying everything without filtering includes unnecessary files, increasing image size.
#3Using multiple FROM lines without naming stages, causing confusion.
Wrong approach:FROM node:18 RUN npm build FROM node:18 COPY --from=0 /app /app
Correct approach:FROM node:18 AS builder RUN npm build FROM node:18 COPY --from=builder /app /app
Root cause:Not naming stages makes COPY --from references unclear and error-prone.
Key Takeaways
Multi-stage builds let you create smaller, cleaner container images by separating build and runtime steps.
Using multiple FROM commands, you can build your app in one stage and copy only what you need to the final image.
This approach reduces image size, speeds up deployment, and improves security by excluding build tools from runtime.
Understanding Docker's layer caching and careful file copying is essential to maximize multi-stage build benefits.
Multi-stage builds are a modern best practice in containerization, especially valuable in complex environments like machine learning.