Quick Start: Getting Jaeger Up in 5 Minutes
Three hours into a production outage, scrolling through millions of logs, you realize the hard way: logs show the symptoms, but traces show the cause. When a single user click triggers five API calls and two database queries, you need a map, not a list. Let’s build that map using Jaeger and Docker in under five minutes.
We’ll use a docker-compose.yml file to spin up the Jaeger ‘all-in-one’ image. This version is a swiss-army knife for developers. It bundles the UI, collector, and query engine into one lightweight container.
version: '3.8'
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # Jaeger UI
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
environment:
- COLLECTOR_OTLP_ENABLED=true
service-a:
image: node:18
working_dir: /app
volumes:
- ./service-a:/app
command: npm start
environment:
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317
depends_on:
- jaeger
Run docker-compose up -d and head over to http://localhost:16686. In my experience, this setup is the turning point for most teams. It transforms debugging from a guessing game into a visual science.
Deep Dive: The Anatomy of a Trace
To master Jaeger, you need to understand two concepts: Spans and Traces. Think of a Trace as the entire story of a request. A Span is a single chapter—like a specific database query or a 200ms API call to a downstream service.
The Logic of Observation
Context propagation is the secret sauce. When Service A calls Service B, it must pass along a “Trace ID.” Jaeger doesn’t use magic to link your services. Instead, your code injects this ID into the HTTP headers, typically using the traceparent W3C standard.
Forget the legacy Jaeger SDKs; OpenTelemetry (OTel) is the industry standard now. Jaeger supports it natively, which future-proofs your stack. Here is how you instrument a Node.js service to talk to your Dockerized Jaeger instance:
// instrumentation.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
}),
serviceName: 'order-service',
});
sdk.start();
The all-in-one Docker image is great because it skips the hassle of setting up Cassandra or Elasticsearch. Just remember: it stores everything in RAM. If you restart that container, your data vanishes. Use it for your laptop, but keep it far away from your production environment.
Advanced Usage: Handling Production Traffic
Scale changes everything. Tracing generates a mountain of data. If your system handles 1,000 requests per second and each request creates 10 spans, you’re looking at 10,000 spans every second. That will crash a basic setup in minutes.
Smart Sampling
You don’t need to record every single request. In production, we use Probabilistic Sampling. Recording just 1% of your traffic is usually enough to spot performance trends without melting your storage budget.
# Setting 1% sampling via environment variables
OTEL_TRACES_SAMPLER=parentbased_always_off
OTEL_TRACES_SAMPLER_ARG=0.01
Moving to Persistent Storage
For a serious deployment, you must split the Jaeger Collector and Query services. The Collector grabs data from your apps and writes it to Elasticsearch. The Query service then pulls that data back for the UI. This separation allows you to scale the components independently as your traffic grows.
# Production-ready storage config
jaeger-collector:
image: jaegertracing/jaeger-collector
environment:
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
jaeger-query:
image: jaegertracing/jaeger-query
environment:
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
Practical Lessons from the Trenches
I’ve managed hundreds of microservices. Here is what I wish I knew before I started with distributed tracing.
1. Custom Tags Save Lives
Standard traces are fine, but business tags are better. Always attach IDs like customer_id or order_id to your spans. When a high-value client complains about a slow checkout at 2 AM, you can find their specific trace in seconds.
const span = tracer.startSpan('process-payment');
span.setAttribute('order.id', '12345');
span.setAttribute('payment.provider', 'stripe');
span.end();
2. Watch the Performance Tax
Tracing isn’t free, but it’s cheap if you’re smart. Using gRPC (port 4317) is significantly more efficient than HTTP. In our benchmarks, OTel instrumentation usually adds less than 2% CPU overhead. Just ensure your exporters are non-blocking so they don’t stall your app if Jaeger is down.
3. Trust the Dependency Graph
The ‘Dependencies’ tab in the Jaeger UI is a hidden gem. It builds a live map of your architecture based on actual traffic. This is usually far more accurate than any README or architectural diagram your team hasn’t updated in six months.
4. Hunting “Ghost” Spans
If Service A calls Service B but B never shows up in the trace, check your middleware. A proxy or load balancer is likely stripping the headers. Always verify that your trace context survives every hop, especially when crossing from a public gateway to a private network.
Distributed tracing is no longer a luxury. It is the backbone of modern observability. Start with Docker to get a feel for it, then weave it deeply into your workflow to kill those “impossible” bugs for good.

