Bird
Raised Fist0
MLOpsdevops~15 mins

Multi-stage builds for smaller images in MLOps - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Multi-stage builds for smaller images
What is it?
Multi-stage builds are a way to create smaller, efficient container images by using multiple steps in one build process. Each step can use a different base image and only the necessary parts are copied to the final image. This helps keep the final container lightweight and fast to deploy. It is especially useful in machine learning operations where images can get large due to dependencies.
Why it matters
Without multi-stage builds, container images often include unnecessary files and tools, making them large and slow to transfer or start. This wastes storage and network resources and slows down deployment. Multi-stage builds solve this by separating build-time and runtime environments, reducing image size and improving performance. This means faster updates, less cost, and more reliable deployments.
Where it fits
Before learning multi-stage builds, you should understand basic Docker images and Dockerfile syntax. After mastering multi-stage builds, you can explore advanced image optimization, security scanning, and automated CI/CD pipelines that build and deploy these images efficiently.
Mental Model
Core Idea
Multi-stage builds let you use multiple temporary containers to build your app, then copy only the final needed parts into a small, clean image.
Think of it like...
It's like cooking a meal in several pots but only serving the finished dish on the plate, leaving all the dirty pots behind in the kitchen.
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Build Stage 1 │──────▶│ Build Stage 2 │──────▶│ Final Image   │
│ (compile code)│       │ (prepare app) │       │ (runtime only)│
└───────────────┘       └───────────────┘       └───────────────┘
          │                      │                      ▲
          │                      │                      │
          └─────────────copy─────┴──────────────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding basic Docker images
🤔
Concept: Learn what a Docker image is and how it contains everything needed to run an app.
A Docker image is like a snapshot of a computer environment. It includes the app code, system libraries, and tools. You write a Dockerfile to describe how to build this image step-by-step. When you run the image, it creates a container that runs your app.
Result
You can create and run a simple Docker image with your app inside.
Understanding images as self-contained environments helps you see why controlling their size and contents matters.
2
FoundationWriting a simple Dockerfile
🤔
Concept: Learn the basic commands to create a Dockerfile that builds an image.
A Dockerfile starts with a base image, then adds files and runs commands. For example: FROM python:3.9-slim COPY app.py /app/ CMD ["python", "/app/app.py"] This builds an image that runs a Python app.
Result
You can build and run a Docker image from your Dockerfile.
Knowing how to write a Dockerfile is the foundation for using multi-stage builds.
3
IntermediateIntroducing multi-stage build syntax
🤔Before reading on: do you think a Dockerfile can have multiple FROM lines? Commit to yes or no.
Concept: Learn that Dockerfiles can have multiple FROM commands to create stages.
Multi-stage builds use multiple FROM lines, each starting a new stage. For example: FROM golang:1.20 AS builder RUN go build -o app . FROM alpine:3.18 COPY --from=builder /app /app CMD ["/app"] The first stage builds the app, the second creates a small image with only the app binary.
Result
You create a smaller final image by copying only needed files from the build stage.
Knowing that each FROM starts a new stage lets you separate build and runtime environments cleanly.
4
IntermediateCopying artifacts between stages
🤔Before reading on: do you think you can copy files from one stage to another using COPY? Commit to yes or no.
Concept: Learn how to copy files from one build stage to another using --from flag.
The COPY command can use --from=stage_name to copy files from a previous stage. For example: COPY --from=builder /app /app This copies the built app from the builder stage into the final image.
Result
You can selectively include only the files you want in the final image.
Understanding this selective copying is key to reducing image size and keeping runtime clean.
5
IntermediateReducing image size with multi-stage builds
🤔
Concept: Learn how multi-stage builds help remove build tools and dependencies from the final image.
Build tools like compilers are needed only during build, not runtime. By using a build stage with all tools, then copying only the final app to a minimal base image, you avoid including bulky tools in the final image. This can reduce image size by hundreds of megabytes.
Result
Final images are smaller, faster to download, and more secure.
Knowing that build and runtime environments differ helps you optimize images for production.
6
AdvancedUsing multi-stage builds in ML pipelines
🤔Before reading on: do you think multi-stage builds can help reduce ML container sizes by excluding training dependencies? Commit to yes or no.
Concept: Learn how multi-stage builds separate heavy ML training dependencies from lightweight inference images.
ML projects often need large libraries for training but only a few for inference. Use a build stage with all training tools to prepare models, then copy only the model and minimal runtime dependencies to the final image. This keeps inference containers small and efficient.
Result
ML deployment images are smaller and faster, improving scalability and cost.
Understanding this separation improves ML deployment workflows and resource use.
7
ExpertAdvanced tricks and pitfalls in multi-stage builds
🤔Before reading on: do you think adding many stages always reduces image size? Commit to yes or no.
Concept: Learn subtle behaviors like caching, layer ordering, and when multi-stage builds might not reduce size.
Docker caches layers by stage, so changing early stages can invalidate cache for later ones. Also, copying large files multiple times can increase size. Sometimes, multi-stage builds add complexity without size benefit if not designed carefully. Using .dockerignore and minimizing copied files is essential.
Result
You build efficient images and avoid common mistakes that increase build time or image size.
Knowing Docker's caching and layering helps you write smarter multi-stage Dockerfiles.
Under the Hood
Docker builds images in layers. Each command in a Dockerfile creates a new layer. Multi-stage builds create separate intermediate images for each stage. Only the final stage is saved as the output image. COPY --from copies files from intermediate images without including their layers. This avoids carrying build tools and files into the final image, reducing size.
Why designed this way?
Originally, Docker images included all build tools, making them large. Multi-stage builds were introduced to separate build and runtime environments in one Dockerfile, simplifying workflows and reducing image size. Alternatives like separate Dockerfiles or manual cleanup were error-prone and complex.
┌───────────────┐    build    ┌───────────────┐
│ Stage 1: Build│───────────▶│ Intermediate   │
│ (with tools)  │            │ image layers   │
└───────────────┘            └───────────────┘
         │                          │
         │ copy files               │ copy only needed files
         ▼                          ▼
┌───────────────┐            ┌───────────────┐
│ Stage 2: Final│            │ Final Image   │
│ (minimal base)│            │ (small size)  │
└───────────────┘            └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does multi-stage build always make your image smaller? Commit yes or no.
Common Belief:Multi-stage builds always reduce image size significantly.
Tap to reveal reality
Reality:If not designed carefully, multi-stage builds can produce images as large or larger due to copying unnecessary files or poor caching.
Why it matters:Assuming size always shrinks can lead to bloated images and longer build times.
Quick: Can you copy files from any stage in any order? Commit yes or no.
Common Belief:You can copy files from any stage regardless of order or naming.
Tap to reveal reality
Reality:You can only copy from stages defined before the current stage and must use correct stage names or indexes.
Why it matters:Misusing COPY --from causes build errors and confusion.
Quick: Does multi-stage build remove all build tools from the final image automatically? Commit yes or no.
Common Belief:Multi-stage builds automatically remove build tools from the final image.
Tap to reveal reality
Reality:You must explicitly copy only needed files; build tools remain if copied or installed in the final stage.
Why it matters:Assuming automatic cleanup can cause large images and security risks.
Quick: Is multi-stage build a new Docker feature? Commit yes or no.
Common Belief:Multi-stage builds are a legacy Docker feature.
Tap to reveal reality
Reality:Multi-stage builds were introduced in Docker 17.05 and are a modern best practice.
Why it matters:Using older methods misses out on efficiency and simplicity.
Expert Zone
1
Using named stages improves readability and maintainability of complex Dockerfiles.
2
Layer caching behavior differs between stages; ordering commands to maximize cache reuse speeds up builds.
3
Combining multi-stage builds with .dockerignore files further reduces image size by excluding unnecessary files early.
When NOT to use
Avoid multi-stage builds when your app is extremely simple or when build and runtime environments must be identical for debugging. In such cases, single-stage builds or specialized build tools like Bazel may be better.
Production Patterns
In production, multi-stage builds are combined with CI/CD pipelines that build, test, and push optimized images automatically. Common patterns include separate build stages for compiling code, running tests, and packaging minimal runtime images with only necessary binaries and config.
Connections
Continuous Integration/Continuous Deployment (CI/CD)
Multi-stage builds are often integrated into CI/CD pipelines to automate efficient image creation and deployment.
Understanding multi-stage builds helps optimize CI/CD workflows by producing smaller, faster images that speed up delivery.
Software Build Systems
Multi-stage builds mirror concepts in build systems that separate compilation and packaging steps.
Knowing build systems clarifies why separating build and runtime environments reduces complexity and size.
Manufacturing Assembly Lines
Multi-stage builds resemble assembly lines where components are built separately then assembled into a final product.
Seeing multi-stage builds as assembly lines helps grasp the efficiency gained by modular, staged construction.
Common Pitfalls
#1Including build tools in the final image by installing them in the last stage.
Wrong approach:FROM python:3.9 RUN apt-get update && apt-get install -y build-essential COPY . /app RUN make /app CMD ["python", "/app/app.py"]
Correct approach:FROM python:3.9 AS builder RUN apt-get update && apt-get install -y build-essential COPY . /app RUN make /app FROM python:3.9-slim COPY --from=builder /app /app CMD ["python", "/app/app.py"]
Root cause:Not separating build and runtime stages causes build tools to remain in the final image.
#2Copying entire build directory instead of only needed files.
Wrong approach:COPY --from=builder /app /app
Correct approach:COPY --from=builder /app/bin/app /app/bin/app COPY --from=builder /app/config.yaml /app/config.yaml
Root cause:Copying everything without filtering includes unnecessary files, increasing image size.
#3Using multiple FROM lines without naming stages, causing confusion.
Wrong approach:FROM node:18 RUN npm build FROM node:18 COPY --from=0 /app /app
Correct approach:FROM node:18 AS builder RUN npm build FROM node:18 COPY --from=builder /app /app
Root cause:Not naming stages makes COPY --from references unclear and error-prone.
Key Takeaways
Multi-stage builds let you create smaller, cleaner container images by separating build and runtime steps.
Using multiple FROM commands, you can build your app in one stage and copy only what you need to the final image.
This approach reduces image size, speeds up deployment, and improves security by excluding build tools from runtime.
Understanding Docker's layer caching and careful file copying is essential to maximize multi-stage build benefits.
Multi-stage builds are a modern best practice in containerization, especially valuable in complex environments like machine learning.

Practice

(1/5)
1. What is the main benefit of using multi-stage builds in Docker?
easy
A. They enable Docker images to run on any operating system without modification.
B. They create smaller and cleaner Docker images by separating build and runtime stages.
C. They automatically update the base image to the latest version.
D. They allow running multiple containers simultaneously.

Solution

  1. Step 1: Understand multi-stage build concept

    Multi-stage builds separate the build environment from the runtime environment in Dockerfiles.
  2. Step 2: Identify the benefit of separation

    This separation removes unnecessary build tools from the final image, making it smaller and cleaner.
  3. Final Answer:

    They create smaller and cleaner Docker images by separating build and runtime stages. -> Option B
  4. Quick Check:

    Multi-stage builds = smaller images [OK]
Hint: Multi-stage builds reduce image size by splitting build and runtime [OK]
Common Mistakes:
  • Confusing multi-stage builds with running multiple containers
  • Thinking multi-stage builds update base images automatically
  • Assuming multi-stage builds change OS compatibility
2. Which of the following is the correct syntax to start a new stage named 'builder' in a Dockerfile?
easy
A. FROM ubuntu AS builder
B. STAGE builder FROM ubuntu
C. NEW STAGE builder FROM ubuntu
D. BUILD STAGE builder FROM ubuntu

Solution

  1. Step 1: Recall Dockerfile multi-stage syntax

    To start a new build stage, Dockerfile uses 'FROM <image> AS <name>'.
  2. Step 2: Match correct syntax

    Only 'FROM ubuntu AS builder' matches the correct syntax for naming a stage.
  3. Final Answer:

    FROM ubuntu AS builder -> Option A
  4. Quick Check:

    Stage naming uses 'FROM ... AS ...' [OK]
Hint: Use 'FROM image AS name' to start a new build stage [OK]
Common Mistakes:
  • Using 'STAGE' keyword which does not exist
  • Writing 'NEW STAGE' instead of 'FROM ... AS ...'
  • Confusing 'BUILD STAGE' with Dockerfile syntax
3. Given this Dockerfile snippet, what will be the size effect on the final image?
FROM golang:1.20 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp

FROM alpine:latest
COPY --from=builder /app/myapp /usr/local/bin/myapp
CMD ["myapp"]
medium
A. The final image will fail to build due to missing Go compiler in the second stage.
B. The final image will be large because it includes the entire Go build environment.
C. The final image will be small because it only copies the built binary from the builder stage.
D. The final image will include both Alpine and Go base images merged.

Solution

  1. Step 1: Analyze multi-stage build steps

    The first stage builds the Go binary using the full Go environment. The second stage uses a minimal Alpine image.
  2. Step 2: Understand what is copied to final image

    Only the compiled binary '/app/myapp' is copied from the builder stage to the final image, excluding build tools.
  3. Final Answer:

    The final image will be small because it only copies the built binary from the builder stage. -> Option C
  4. Quick Check:

    Copying only binary = smaller final image [OK]
Hint: Final image size shrinks by copying only needed files [OK]
Common Mistakes:
  • Assuming the entire build environment is included in final image
  • Thinking the build fails due to missing compiler in second stage
  • Believing base images merge into one large image
4. Identify the error in this Dockerfile snippet using multi-stage build:
FROM node:18 AS builder
WORKDIR /app
COPY package.json .
RUN npm install
COPY . .
RUN npm run build

FROM node:18
COPY --from=builder /app/dist ./dist
CMD ["node", "./dist/server.js"]
medium
A. The COPY command in the second stage has incorrect source path syntax.
B. The first stage is missing a WORKDIR declaration.
C. The CMD command is missing square brackets for JSON array syntax.
D. The second stage should use a smaller base image like 'node:18-alpine' to reduce size.

Solution

  1. Step 1: Review base images used in both stages

    Both stages use 'node:18', which is a full Node image including build tools.
  2. Step 2: Suggest optimization for smaller final image

    Using a smaller base like 'node:18-alpine' in the second stage reduces image size by excluding unnecessary tools.
  3. Final Answer:

    The second stage should use a smaller base image like 'node:18-alpine' to reduce size. -> Option D
  4. Quick Check:

    Use lightweight base images in final stage [OK]
Hint: Use lightweight base images in final stage for smaller images [OK]
Common Mistakes:
  • Thinking COPY syntax is incorrect when it is valid
  • Believing CMD needs different syntax here
  • Assuming WORKDIR is missing in first stage
5. You want to build a Python app with dependencies installed only during build, but keep the final image minimal. Which multi-stage Dockerfile snippet achieves this best?
hard
A.
FROM python:3.12 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

FROM python:3.12-slim
COPY --from=builder /app /app
CMD ["python", "/app/app.py"]
B.
FROM python:3.12
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "/app/app.py"]
C.
FROM python:3.12 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "/app/app.py"]
D.
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "/app/app.py"]

Solution

  1. Step 1: Understand requirement for minimal final image

    Dependencies should be installed in a build stage, not in the final image, to keep it small.
  2. Step 2: Analyze options for multi-stage usage

    FROM python:3.12 AS builder
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install -r requirements.txt
    COPY . .
    
    FROM python:3.12-slim
    COPY --from=builder /app /app
    CMD ["python", "/app/app.py"]
    uses a builder stage to install dependencies and copies only the app to a slim final image, achieving minimal size.
  3. Final Answer:

    Option A correctly uses multi-stage build to keep final image minimal. -> Option A
  4. Quick Check:

    Install dependencies in builder, copy to slim final image [OK]
Hint: Install dependencies in builder stage, copy only needed files to slim image [OK]
Common Mistakes:
  • Installing dependencies directly in final image increasing size
  • Not using multi-stage build at all
  • Running app in builder stage instead of final stage