Continuous Profiling with Grafana Pyroscope on Kubernetes: Find CPU and Memory Bottlenecks in Production

Table of Contents

Your App Is Slow — But Where Exactly?

You have metrics from Prometheus. You have traces from Jaeger. You have logs from Loki. And yet, when your production service spikes to 90% CPU at 2 AM, none of those tools tell you which function is burning cycles. That gap is what continuous profiling closes.

I ran into this exact problem while managing a Go microservice that processed batch jobs. Memory usage kept climbing — not fast enough to page anyone, but steadily enough that pods restarted every few days. Prometheus showed RSS growing. Logs showed nothing unusual. It took three days of manual pprof snapshots before I found a goroutine leak buried in a third-party library. After that, continuous profiling went straight onto my must-have list for any production Kubernetes setup.

Grafana Pyroscope is an open-source continuous profiling platform. It samples call stacks and builds flame graphs from your running applications 24/7 — no manual snapshot required. It integrates with Grafana’s observability stack and works across Go, Java, Python, Ruby, and more.

Why Traditional Profiling Falls Short in Kubernetes

The classic profiling workflow goes like this: reproduce the issue locally, attach a profiler, capture a snapshot, analyze it. Fine on a developer laptop. In Kubernetes, it breaks down fast:

Ephemeral pods — by the time you notice the problem, the offending pod may already be restarted and gone.
Non-reproducible spikes — production traffic patterns are hard to replicate. The spike that happened at 2 AM under real load won’t happen on your local machine.
Multiple replicas — a performance bug may only appear in one of ten pod replicas, making manual profiling a guessing game.
No historical data — without continuous profiling, you can only see what’s happening right now, not what happened before you looked.

Pyroscope addresses all of these by running a lightweight agent (or using eBPF) that continuously samples stack traces and ships them to a central store. Query any time window, compare before and after a deployment, correlate CPU spikes with the exact call stack responsible.

Core Concepts Before You Deploy

Flame Graphs

A flame graph is a visualization where each bar represents a function call. Bar width maps to how much CPU time (or memory) that function consumed, including everything it called. The wider the bar, the bigger the bottleneck. Pyroscope builds these continuously, so you always have a historical record to query.

Pull vs Push Profiling

Two modes are available. In pull mode, the Pyroscope server scrapes your application’s /debug/pprof endpoint — Go-native, zero instrumentation needed. In push mode, you instrument your app with the Pyroscope SDK to push profiles on its own schedule. For Go and Java, pull mode is the easier entry point.

eBPF Profiling

Can’t instrument the app — or don’t want to? Pyroscope can use eBPF to profile at the kernel level. This works for any language without code changes, but requires a privileged DaemonSet. It’s particularly useful for C/C++ services or when you want visibility across an entire node.

Deploying Pyroscope on Kubernetes

Prerequisites

Kubernetes 1.24+ cluster
Helm 3.x
kubectl configured
A running Grafana instance (or deploy one alongside Pyroscope)

Install via Helm

Add the Grafana Helm chart repository and deploy Pyroscope into a dedicated namespace:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

kubectl create namespace pyroscope

helm install pyroscope grafana/pyroscope \
  --namespace pyroscope \
  --set pyroscope.replicationFactor=1 \
  --set minio.enabled=true

MinIO ships with the default chart for local object storage. For production, swap it out for S3, GCS, or Azure Blob — set storage.backend=s3 (or gcs / azure) along with your bucket credentials in the Helm values.

Verify the Deployment

kubectl get pods -n pyroscope
# NAME                          READY   STATUS    RESTARTS
# pyroscope-0                   1/1     Running   0
# pyroscope-minio-0             1/1     Running   0

kubectl port-forward svc/pyroscope -n pyroscope 4040:4040

Open http://localhost:4040 to access the Pyroscope UI directly.

Scraping a Go Application (Pull Mode)

If your Go service already imports net/http/pprof, Pyroscope can scrape it with zero code changes. Here’s a minimal Go service that exposes pprof:

package main

import (
    "net/http"
    _ "net/http/pprof" // registers /debug/pprof handlers
)

func main() {
    http.ListenAndServe(":8080", nil)
}

Now configure Pyroscope to scrape it. Create a scrape-config.yaml:

scrapeConfigs:
  - jobName: my-go-service
    scrapeInterval: 15s
    staticConfigs:
      - targets:
          - my-go-service.default.svc.cluster.local:8080
    profilingConfig:
      pprof:
        enabled: true
        path: /debug/pprof/

Apply it as part of your Pyroscope Helm values or as a ConfigMap mounted into the Pyroscope pod.

Instrumenting a Python App (Push Mode)

Python services push profiles using the Pyroscope SDK:

pip install pyroscope-io

import pyroscope

pyroscope.configure(
    application_name="my-python-service",
    server_address="http://pyroscope.pyroscope.svc.cluster.local:4040",
    tags={
        "env": "production",
        "version": "1.2.3",
    }
)

# Your application code runs normally here
# Pyroscope samples in the background

Don’t skip the tags dict. Those labels are how you filter and compare profiles later — for example, diffing version=1.2.2 against version=1.2.3 after a deploy to see exactly what regressed.

Connecting Pyroscope to Grafana

Add Pyroscope as a data source in Grafana:

Go to Configuration → Data Sources → Add data source
Search for Grafana Pyroscope
Set the URL to http://pyroscope.pyroscope.svc.cluster.local:4040
Click Save & Test

Once connected, open the Explore view, select the Pyroscope data source, pick your application, and start browsing flame graphs. Grafana’s correlations feature takes this further: wire up Prometheus and Loki alongside Pyroscope in a single panel. When a CPU spike fires, click straight through from the metric to the flame graph. No tab-switching, no guesswork.

Deploying the eBPF Agent as a DaemonSet

For language-agnostic profiling of every pod on a node, deploy the Pyroscope eBPF agent:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: pyroscope-ebpf
  namespace: pyroscope
spec:
  selector:
    matchLabels:
      app: pyroscope-ebpf
  template:
    metadata:
      labels:
        app: pyroscope-ebpf
    spec:
      hostPID: true
      containers:
        - name: pyroscope-ebpf
          image: grafana/pyroscope-ebpf:latest  # pin to a specific version in production
          securityContext:
            privileged: true
          env:
            - name: PYROSCOPE_SERVER_ADDRESS
              value: "http://pyroscope.pyroscope.svc.cluster.local:4040"
          volumeMounts:
            - name: host-sys
              mountPath: /sys
              readOnly: true
      volumes:
        - name: host-sys
          hostPath:
            path: /sys

privileged: true is required for eBPF — there’s no way around it. Review your cluster’s pod security policies before rolling this out in a shared environment, and pin the image tag rather than using latest.

Reading the Flame Graphs: A Practical Walkthrough

When you open a flame graph in Pyroscope or Grafana, go straight to the widest bars. Those are your bottlenecks. A few patterns worth knowing:

A single wide bar at the top of the stack — one function consuming disproportionate CPU. Start there.
Many narrow bars all rooted in the same parent — a function called in a tight loop. Caching or batching usually helps.
GC-related frames taking >10% of CPU — in Go or Java, this often means too many short-lived allocations on hot paths. Look for unnecessary copies or intermediate slices.
I/O wait frames dominating — the bottleneck isn’t CPU, it’s latency. The flame graph will point to which database calls or HTTP requests are blocking.

The diff view is worth getting familiar with early. Deploy a release, then compare its flame graph against the previous version. You see exactly what changed in the performance profile — not just that latency went up by 8%, but which function is now eating three times more CPU than before.

Wrapping Up

Metrics tell you something is wrong. Traces tell you where in the request lifecycle. Profiles tell you why — which lines of code are actually responsible. Pyroscope closes that gap.

Setup is minimal: one Helm install, a scrape config or a few SDK lines, one Grafana data source. Next time a pod eats unexpected CPU at 2 AM, you won’t be guessing. You’ll have a flame graph showing the exact function responsible, with historical data going back days or weeks — Pyroscope defaults to a 7-day retention window, configurable per your storage limits.

Start with pull mode on your Go or Java services. It’s the easiest entry point, zero instrumentation required. Once you’ve caught your first real bottleneck from a flame graph, expand to push mode for Python or add the eBPF DaemonSet for full-node coverage.