Automating SSL/TLS Certificate Lifecycle Management with cert-manager on Kubernetes

Security tutorial - IT technology blog
Security tutorial - IT technology blog

The 3 AM Certificate Expiry Incident

After my server got hit by SSH brute-force attacks at midnight, security became my first concern on every new project. So when I joined a team running Kubernetes in production, my opening question was: “Who’s managing your SSL certificates?” The answer was a shared Google Sheet with expiry dates and a Slack reminder bot. Three weeks later, a certificate expired over a weekend and took down the payment gateway for six hours.

Six hours of downtime. From a cert that everyone knew would expire. That’s when it clicked: manual certificate management at scale isn’t a process — it’s a countdown timer you forgot to check.

Why Manual Certificate Management Breaks Down

The problem isn’t laziness. It’s that complexity grows faster than any spreadsheet can track.

A small cluster might have 10 services with TLS. A medium production environment can easily have 50–100 certificates across multiple namespaces, multiple domains, internal services, and wildcard certs. Each one has its own expiry date, renewal window, CA issuer, and secret location. No human tracks that reliably — not without mistakes, and definitely not over a long weekend.

Three failure modes show up again and again:

  • Missed renewals — The reminder fires, gets buried in Slack, and nobody acts on it before Friday evening.
  • Secret name mismatch — Someone manually updates the cert but the Kubernetes Secret name doesn’t match what the Ingress controller expects. Traffic breaks silently.
  • Private CA drift — Internal services use a self-signed CA, but the CA cert itself expires after 2 years. Nobody tracked it.

Every one of these is preventable with automation.

What Are Your Options?

cert-manager isn’t the only tool here. Understanding the alternatives makes it easier to justify the choice.

Option 1: Certbot on each node

The classic approach. Run certbot renew as a cron job on every VM. Works fine for static servers, but it’s a poor fit for Kubernetes. Certificates live outside the cluster and need to be manually synced into Secrets. If a pod restarts and mounts a stale Secret, you’re debugging TLS errors at the worst possible time.

Option 2: External secrets + Vault

HashiCorp Vault can issue and renew certificates through its PKI secrets engine. It’s excellent for large enterprises with complex CA hierarchies. For smaller teams, though, Vault itself needs to be deployed, hardened, unsealed, and maintained — significant overhead before you get any benefit from it.

Option 3: cert-manager

cert-manager runs inside your cluster as a Kubernetes-native controller. It watches custom resources (Certificate, Issuer, ClusterIssuer) and handles the entire lifecycle: request, issuance, storage as a Kubernetes Secret, and automatic renewal. No manual steps. It supports Let’s Encrypt (ACME), private CAs, Vault, and Venafi out of the box.

For most teams on Kubernetes, cert-manager is the obvious fit. It integrates naturally with Ingress annotations and requires almost no ongoing maintenance once it’s running.

Setting Up cert-manager: Step by Step

Step 1: Install cert-manager

Use the official Helm chart. You’ll need Kubernetes 1.22+ and Helm 3.

# Add the Jetstack Helm repo
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Install cert-manager with CRDs
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true

Before moving on, confirm the pods are healthy:

kubectl get pods -n cert-manager

You should see three pods: cert-manager, cert-manager-cainjector, and cert-manager-webhook, all in Running state. If the webhook pod is stuck, wait 60 seconds — it initializes last.

Step 2: Configure a Let’s Encrypt ClusterIssuer

A ClusterIssuer is cluster-wide. A plain Issuer is namespace-scoped. For public-facing services, Let’s Encrypt with HTTP-01 challenge is the simplest starting point.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: [email protected]
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
    - http01:
        ingress:
          ingressClassName: nginx

Apply it, then check the status:

kubectl apply -f clusterissuer-letsencrypt.yaml
kubectl get clusterissuer letsencrypt-prod -o yaml

Look for Ready: True in the status conditions. If it’s stuck, check the controller logs: kubectl logs -n cert-manager deploy/cert-manager. Nine times out of ten it’s a typo in the email or server URL.

Step 3: Issue Your First Certificate via Ingress Annotation

Add a single annotation to your Ingress. cert-manager detects it and automatically creates a Certificate resource — no extra manifests required.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-ingress
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: my-app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app
            port:
              number: 80

Within 60–90 seconds, cert-manager creates the Secret my-app-tls in the production namespace. Watch it happen in real time:

kubectl get certificate -n production -w
kubectl describe certificate my-app-tls -n production

Renewal kicks in automatically when the certificate is 30 days from expiry (for 90-day Let’s Encrypt certs). You don’t touch it again.

Step 4: Private CA for Internal Services

Internal microservices on mTLS need certificates too. They shouldn’t use Let’s Encrypt — that’s for public domains. Use a private CA issuer instead.

Generate a root CA first:

# Generate CA private key
openssl genrsa -out ca.key 4096

# Generate CA certificate (valid 10 years)
openssl req -new -x509 -days 3650 -key ca.key -out ca.crt \
  -subj "/CN=Internal Cluster CA/O=MyOrg"

Store it as a Kubernetes Secret:

kubectl create secret tls internal-ca-secret \
  --cert=ca.crt \
  --key=ca.key \
  -n cert-manager

Create a ClusterIssuer backed by that CA:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca
spec:
  ca:
    secretName: internal-ca-secret

Internal services now request certificates from your private CA using the same Certificate resource pattern — just pointing to internal-ca as the issuer.

Step 5: Explicit Certificate Resources for Non-Ingress Workloads

Some services never touch an Ingress — gRPC backends, PostgreSQL with TLS, internal APIs. For those, create a Certificate resource directly:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: postgres-tls
  namespace: database
spec:
  secretName: postgres-tls-secret
  issuerRef:
    name: internal-ca
    kind: ClusterIssuer
  dnsNames:
  - postgres.database.svc.cluster.local
  duration: 720h       # 30 days
  renewBefore: 168h    # renew 7 days before expiry

Monitoring Certificate Health

Automation handles renewals, but you still want visibility. cert-manager exposes Prometheus metrics by default. Start with a quick audit across all namespaces:

# List all certificates
kubectl get certificates --all-namespaces

# Flag anything that isn't Ready
kubectl get certificates --all-namespaces | grep -v True

For alerting, add a Prometheus rule that fires when a certificate is under 14 days from expiry and hasn’t renewed yet:

- alert: CertificateExpirationWarning
  expr: certmanager_certificate_expiration_timestamp_seconds - time() < 1209600
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "Certificate expiring in less than 14 days"

14 days gives you two full business weeks to investigate without a fire drill.

Common Mistakes to Avoid

  • Using staging Let’s Encrypt in production — The staging ACME endpoint (https://acme-staging-v02.api.letsencrypt.org/directory) issues certificates that browsers don’t trust. Use it for testing, then swap the server URL before going live.
  • HTTP-01 behind a firewall — HTTP-01 requires your domain to be publicly reachable on port 80. Private clusters or air-gapped environments need DNS-01 challenge instead — cert-manager supports Route53, Cloudflare, and others.
  • Mixing Issuer and ClusterIssuer scope — A plain Issuer only works within its own namespace. Reference it from a different namespace and you’ll get a cryptic “issuer not found” error. When in doubt, use ClusterIssuer.
  • Forgetting your private CA expiry — cert-manager manages the leaf certificates it issues. It won’t warn you when the CA cert itself expires. Set a calendar reminder for your CA expiry, or wire up the Prometheus alert above to catch it early.

The Result: Zero-Touch Certificate Management

After rolling this out across our cluster, certificate management disappeared as a recurring task. No spreadsheet. No Slack reminders. No weekend pages.

The Prometheus alert fired twice in eight months. Both times it was for services we’d temporarily moved outside cert-manager’s control. Both times we caught it with more than 10 days to spare — enough to fix it calmly during business hours.

The setup is minimal: install cert-manager, create one ClusterIssuer for public domains and one for internal services, add a single annotation to your Ingress resources. Renewals, Secret updates, and Ingress reloads happen without you. Your job becomes reviewing the occasional alert rather than hunting down expiry dates.

That’s a much better place to be at 3 AM.

Share: