Policy as Code on Kubernetes with Kyverno: Automate Config and Security Checks

It was 2:47 AM when the alert fired. A developer had pushed a deployment without resource limits — again. The pod went rogue, consumed all CPU on the node, and took down three other services. We’d had this conversation six times in the last month. Post-mortems always ended the same way: “We need better guardrails.”

That night I finally sat down and implemented what I’d been putting off for weeks: Policy as Code with Kyverno. Six months later, zero repeat incidents of that type. Here’s what production use actually taught me.

Table of Contents

Why Manual Reviews Don’t Scale

Most teams start with code review as their safety net. Someone eyeballs the YAML before it merges. That works when you have two engineers and one cluster. It breaks down fast once you have fifteen engineers deploying to four environments.

Here’s the core problem: policy enforcement at review time is advisory, not mandatory. Someone approves the PR at midnight because CI is green and they’re tired. The missing securityContext slips through. The pod runs as root in production.

Policy as Code moves enforcement to the cluster itself. The API server rejects non-compliant resources before they ever get scheduled. No exceptions, no fatigue, no 2 AM surprises.

Approach Comparison: OPA/Gatekeeper vs Kyverno vs Admission Webhooks

When I started evaluating options, three main approaches came up. Each solves the same problem differently.

Open Policy Agent (OPA) + Gatekeeper

OPA is the older, more battle-hardened option. It uses Rego — a purpose-built policy language — and Gatekeeper as the Kubernetes integration layer. The policy ecosystem is mature, and large organizations with dedicated platform teams love it.

Rego has a steep learning curve. Writing a rule to require resource limits is maybe 20 lines of logic that most engineers find unintuitive. Debugging failed policies means reading Rego traces. For teams without a dedicated platform engineer, it quickly becomes a bottleneck.

Custom Admission Webhooks

You can write your own webhook in Go or any language that speaks HTTP. Total flexibility, zero external dependencies. You also own every line of that code — bugs, TLS certificate rotation, and availability requirements included. A broken webhook can block all deployments.

I’ve seen this approach work well exactly once, at a company with a six-person platform team. Everywhere else it becomes technical debt that nobody wants to touch.

Kyverno

Kyverno is Kubernetes-native. Policies are Kubernetes resources — CRDs written in YAML. If your team can read a Deployment manifest, they can read a Kyverno policy. No new language to learn.

It covers three use cases that handle 90% of what teams actually need: validation (reject bad configs), mutation (automatically fix or augment resources), and generation (create related resources automatically).

Pros and Cons: Honest Assessment

Kyverno Pros

Low barrier to entry — policies are just YAML, readable by anyone on the team
Audit mode — run in Audit mode first to see what would fail before enforcing
Mutation support — automatically inject sidecars, add labels, set defaults
Policy reports — built-in CRDs that show compliance status across namespaces
Active development — CNCF incubating project with frequent releases

Kyverno Cons

Complex logic is awkward — deeply nested conditionals get messy in YAML
Rego is more expressive — if your policies need advanced logic, OPA wins on expressiveness
Webhook availability matters — Kyverno runs as a webhook; you need HA deployment in production
Relatively newer — less community-contributed policy content than OPA, though this gap is closing fast

My Take

For most teams — especially those without dedicated platform engineering — Kyverno is the right default. We run it across four clusters handling ~200 deployments per week. Rollout took two days. In six months, I haven’t touched the core setup once. That’s the kind of boring stability you want from security tooling.

Recommended Setup for Production

Before writing a single policy, get the deployment right. A Kyverno webhook that goes down will block all API server requests in its scope. That’s catastrophic.

Run at least three replicas in production:

# Install via Helm (recommended for production)
helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update

helm install kyverno kyverno/kyverno \
  --namespace kyverno \
  --create-namespace \
  --set replicaCount=3 \
  --set admissionController.replicas=3 \
  --set backgroundController.replicas=2 \
  --set cleanupController.replicas=2

Check the webhook configuration after install. By default Kyverno sets failurePolicy: Fail — if Kyverno is unreachable, all resource creation fails. During initial rollout, consider switching to failurePolicy: Ignore, then tighten it once you’ve confirmed Kyverno is stable.

# Verify all pods are running
kubectl get pods -n kyverno

# Check webhook configs
kubectl get validatingwebhookconfigurations | grep kyverno
kubectl get mutatingwebhookconfigurations | grep kyverno

Implementation Guide: Start with Audit, Then Enforce

Going straight to enforcement mode is the fastest way to lose your team’s trust. One policy blocking a critical deployment at the wrong moment poisons the whole initiative. Start with Audit.

Step 1: Require Resource Limits (the incident that started this)

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
  annotations:
    policies.kyverno.io/title: Require Resource Limits
    policies.kyverno.io/description: All containers must define CPU and memory limits.
spec:
  validationFailureAction: Audit  # Start here, change to Enforce later
  background: true
  rules:
    - name: check-resource-limits
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "CPU and memory limits are required for all containers."
        pattern:
          spec:
            containers:
              - name: "*"
                resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"

kubectl apply -f require-resource-limits.yaml

# Check what would fail (audit mode)
kubectl get policyreport -A
kubectl get clusterpolicyreport

Step 2: Block Privileged Containers

Containers running as root or in privileged mode are a classic attack vector — and more common than you’d expect in prod. This policy blocks them:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-privileged-containers
spec:
  validationFailureAction: Audit
  background: true
  rules:
    - name: check-privileged
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Privileged containers are not allowed."
        pattern:
          spec:
            containers:
              - =(securityContext):
                  =(privileged): "false"

Step 3: Auto-inject Labels with Mutation

Mutation is where Kyverno earns its keep beyond validation. Instead of rejecting a Deployment for missing labels, just add them automatically:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-labels
spec:
  rules:
    - name: add-team-label
      match:
        any:
          - resources:
              kinds:
                - Deployment
      mutate:
        patchStrategicMerge:
          metadata:
            labels:
              +(managed-by): "kyverno"
              +(environment): "production"

Step 4: Review Policy Reports

# See all violations across the cluster
kubectl get policyreport -A -o wide

# Detailed report for a specific namespace
kubectl describe policyreport -n default

# Filter just failures
kubectl get policyreport -A -o json | \
  jq '.items[].results[] | select(.result == "fail")'

Give yourself a week with audit reports before touching enforcement. Fix existing violations in your workloads. Then flip policies to Enforce one at a time — never all at once:

# Patch a policy from Audit to Enforce
kubectl patch clusterpolicy require-resource-limits \
  --type=merge \
  -p '{"spec":{"validationFailureAction":"Enforce"}}'

Step 5: Use the Kyverno Policy Library

Before writing custom policies, check the official library. It already covers Pod Security Standards, CIS benchmarks, and common best practices:

# Browse and apply community policies
kubectl apply -f https://raw.githubusercontent.com/kyverno/policies/main/pod-security/baseline/disallow-host-namespaces/disallow-host-namespaces.yaml

Handling Exceptions Without Undermining the Policy

Some workloads legitimately need exceptions. A node-level monitoring agent that needs privileged access is a real example. Kyverno handles this cleanly with exclude blocks and PolicyException resources (available since Kyverno 1.9+).

apiVersion: kyverno.io/v2beta1
kind: PolicyException
metadata:
  name: allow-monitoring-privileged
  namespace: monitoring
spec:
  exceptions:
    - policyName: disallow-privileged-containers
      ruleNames:
        - check-privileged
  match:
    any:
      - resources:
          kinds:
            - Pod
          namespaces:
            - monitoring
          names:
            - node-exporter-*

Exceptions are Kubernetes resources — version-controlled, reviewable, and auditable. No more undocumented bypass decisions made at 2 AM by whoever was on call.

What This Looks Like After Six Months

That original deployment? It would never reach the API server now. The developer gets a clear error message explaining exactly what’s missing. They fix it, push again, it works. Two minutes instead of a three-hour outage and a post-mortem.

Policy reports changed how we handle compliance audits too. “Are all production workloads running without root access?” now takes thirty seconds to answer instead of a week of manual review.

Running Kubernetes in production with code review as your only enforcement layer is a liability. Install Kyverno in audit mode this week — the setup takes ten minutes, and the first policy report will almost certainly show you something you didn’t expect.