How to Write an Efficient Dockerfile: Tips from Real-World Experience

DevOps tutorial - IT technology blog
DevOps tutorial - IT technology blog

A poorly written Dockerfile is one of the first things I look at when a CI/CD pipeline is crawling. I’ve seen 15-minute builds that should run in under 3, and 900 MB images that had no business being over 30 MB. These aren’t edge cases. They’re what happens when Dockerfiles get written once and never revisited.

This guide covers the approaches I’ve seen in production — what works, what doesn’t, and how I write Dockerfiles today.

Approach Comparison: Naive vs. Optimized Dockerfiles

Most developers start with a “just make it work” Dockerfile. It looks something like this:

FROM ubuntu:latest
RUN apt-get update && apt-get install -y python3 pip
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python3", "app.py"]

It works. But the problems add up fast. The image balloons to 600–900 MB. Every rebuild reinstalls all dependencies after even a one-line code change. And production ends up shipping tools and files it never uses.

The optimized approach combines a few techniques:

  • Minimal base images (Alpine, slim variants, or distroless)
  • Layer caching awareness — ordering instructions from least to most frequently changed
  • Multi-stage builds — separating build-time from runtime dependencies
  • .dockerignore — keeping the build context lean

Pros & Cons of Each Approach

Naive Approach

  • Pro: Quick to write, easy to understand
  • Con: Large image size (often 500 MB+)
  • Con: Slow rebuilds — every code change triggers a full dependency reinstall
  • Con: Build tools, compilers, and dev packages end up in the production image
  • Con: Larger attack surface for security vulnerabilities

Optimized Approach

  • Pro: Images as small as 50–150 MB (sometimes under 20 MB with distroless)
  • Pro: Fast rebuilds thanks to effective layer caching
  • Pro: Clean production image — only runtime dependencies, nothing else
  • Con: Slightly more complex to set up initially
  • Con: Alpine-based images can cause subtle issues with glibc-dependent binaries

Writing good Dockerfiles is one of the highest-leverage habits in a DevOps workflow. Shorter CI runs, lower registry costs, fewer production surprises — all from one file.

Recommended Setup

Choose the Right Base Image

Skip ubuntu:latest or debian:latest unless you specifically need that environment. Here’s the hierarchy I use:

  • python:3.12-slim — slimmed official image, good balance of compatibility and size
  • python:3.12-alpine — smallest, but watch for packages that need glibc
  • gcr.io/distroless/python3 — no shell, no package manager, minimal attack surface (best for security-critical production)

For compiled languages like Go or Rust, multi-stage builds let you compile on a full image and copy only the final binary into a distroless or scratch base.

Layer Caching: Order Matters

Docker caches each layer. Change one layer and everything after it rebuilds from scratch. The rule: put instructions that change least at the top.

Wrong order (breaks cache on every code change):

COPY . /app
RUN pip install -r requirements.txt

Correct order (dependencies cached separately):

COPY requirements.txt /app/
RUN pip install -r /app/requirements.txt
COPY . /app

Now pip install only reruns when requirements.txt changes — not on every code edit.

Use Multi-Stage Builds

Multi-stage builds are the single biggest win for compiled or build-heavy apps. Here’s a real Go service example:

# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app/server ./cmd/server

# Stage 2: Production image
FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/server /server
ENTRYPOINT ["/server"]

The final image has only the compiled binary — no Go toolchain, no source code, no package manager. This pattern took one service from 900 MB down to under 20 MB.

Always Use .dockerignore

Docker sends your entire project directory to the daemon on every build by default. On a Node.js project with node_modules, that build context can jump from a few KB to several hundred MB. A .dockerignore fixes that immediately.

A minimal .dockerignore:

.git
.gitignore
*.md
__pycache__
*.pyc
.env
.env.*
tests/
docs/
node_modules/

Implementation Guide

Python Application — Full Example

# syntax=docker/dockerfile:1
FROM python:3.12-slim AS base

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1

WORKDIR /app

# Install dependencies first (cached layer)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ ./src/

# Run as non-root user
RUN adduser --disabled-password --gecos "" appuser
USER appuser

EXPOSE 8000
CMD ["python", "-m", "src.main"]

Node.js Application — Multi-Stage

# Stage 1: Install & build
FROM node:20-alpine AS build
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production

# Stage 2: Minimal runtime
FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/node_modules ./node_modules
COPY src/ ./src/

RUN addgroup -S app && adduser -S app -G app
USER app

EXPOSE 3000
CMD ["node", "src/index.js"]

Common Mistakes to Avoid

  • Don’t use latest tags — pin specific versions like python:3.12.3-slim. latest breaks reproducibility and will eventually bite you in CI.
  • Don’t run as root — create and switch to a non-root user before CMD/ENTRYPOINT.
  • Don’t split apt-get update and apt-get install into separate RUN commands — Docker caches the update layer independently, which leads to stale package lists and broken installs:
# Bad
RUN apt-get update
RUN apt-get install -y curl

# Good
RUN apt-get update && apt-get install -y --no-install-recommends curl \
    && rm -rf /var/lib/apt/lists/*
  • Don’t store secrets in the Dockerfile — use build args only for non-sensitive data. Pass real secrets via environment variables at runtime, or use Docker secrets.
  • Don’t skip HEALTHCHECK — without it, orchestrators like Kubernetes and Docker Swarm can’t tell whether your container is running or actually ready to serve traffic.
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

Verify Your Image Size

After building, check what you ended up with:

docker build -t myapp:latest .
docker images myapp:latest
docker history myapp:latest

For a deeper breakdown, use dive:

dive myapp:latest

dive breaks down space usage layer by layer. One common gotcha: files added in one RUN step and deleted in a later one still consume space in the final image. dive flags those clearly.

Quick Reference Checklist

  • Use a slim or Alpine base image
  • Pin exact image versions
  • Copy dependency files before source code
  • Use multi-stage builds for compiled or build-heavy apps
  • Create and use a non-root user
  • Add a .dockerignore file
  • Clean up package manager caches in the same RUN layer
  • Add a HEALTHCHECK
  • Never bake secrets into the image

These aren’t abstract best practices — each one maps to a real problem I’ve run into. Get this right and your containers build faster, cost less to store, and carry less risk in production. Worth the 30 minutes it takes to set up properly.

Share: