Securing Infrastructure as Code with Checkov: A Guide to Scanning Terraform and Kubernetes in CI/CD

Table of Contents

The Cost of a Single Misconfigured Line

I still remember a 3 AM PagerDuty alert back in 2021. A junior engineer had accidentally deployed an S3 bucket with public read access. It wasn’t a reckless mistake—just a default setting in a community Terraform module that slipped through peer review. Within 15 minutes, automated scanners had already pinged the endpoint. We avoided a data breach by pure luck, but it was a sobering realization. Security cannot be a ‘final check’ performed once a month; it must be baked into every commit.

As we scale with Infrastructure as Code (IaC), human error scales with it. Whether you are managing AWS environments with Terraform or orchestrating clusters with Helm, a single missing attribute can expose your entire stack. Manual reviews are valuable, but they don’t scale to thousands of lines of code, and they certainly aren’t foolproof.

Why Our IaC is Insecure by Default

Misconfigurations usually stem from three specific pressure points. First, cloud providers prioritize ‘onboarding velocity.’ They want you to see results in five minutes, which often leads to wide-open permissions and disabled encryption in their ‘Quick Start’ templates.

Second, the sheer volume of modern configuration is overwhelming. A typical microservices architecture might involve 5,000+ lines of YAML and HCL (HashiCorp Configuration Language). Expecting a tired reviewer to spot a missing allow_privileged_escalation: false buried in a Kubernetes manifest is simply unrealistic.

Finally, there is a growing knowledge gap. Most DevOps engineers are experts at orchestration but aren’t necessarily security researchers. Keeping pace with every new CVE or CIS benchmark for dozens of cloud services is a full-time job that most teams can’t afford.

Comparing the Scanners: Why Checkov?

When I evaluated tools to automate this, I looked at TFSec, Terrascan, and KICS. While each has its merits, I standardized on Checkov for four key reasons:

Broad Ecosystem Support: It handles more than just Terraform. It scans Kubernetes, CloudFormation, Bicep, ARM templates, and even Dockerfiles.
Massive Policy Library: It ships with over 1,000 built-in policies mapped to industry standards like CIS and NIST.
Python-Based Extensibility: Writing custom logic in Python is far more intuitive for most teams than learning a niche, tool-specific Domain Specific Language (DSL).
Graph-Based Analysis: Unlike simple text-search tools, Checkov understands resource relationships. It knows if a specific security group is actually attached to an instance, reducing false positives.

A Better Workflow: Integrating Checkov

The goal isn’t just to generate a report; it’s to create a feedback loop that stops bad code before it’s merged. I recommend a tiered strategy: local testing, pre-commit hooks, and CI/CD enforcement.

1. Local Testing and Setup

Getting started takes less than a minute. I prefer using Python’s package manager, though the Docker image is ideal for keeping your local environment clean.

# Install via pip
pip install checkov

# Or run via Docker
docker run -v $(pwd):/tf bridgecrew/checkov -d /tf

Launch a scan against your current Terraform directory. If you see a wall of red text, don’t panic. This is normal. Most production projects have accumulated legacy technical debt that needs to be addressed incrementally.

checkov -d ./terraform_project

2. Securing Kubernetes Manifests

One of Checkov’s strongest features is its ability to catch ‘lazy’ Kubernetes configurations. It’s easy to forget resource limits or leave a container running as root. You can scan an entire directory of YAML files with one command:

checkov -d ./k8s_manifests --framework kubernetes

Checkov will immediately flag high-risk issues, such as:

Containers running with privileged: true.
Missing Liveness and Readiness probes that impact availability.
Pods without CPU/Memory limits, which often lead to ‘noisy neighbor’ issues in shared clusters.

3. Automated Enforcement in CI/CD

This is where the automation pays off. If a configuration fails the security scan, the build should stop. GitHub Actions is the most straightforward way to enforce this. Here is a production-ready snippet:

name: IaC Security Scan
on: [push, pull_request]

jobs:
  checkov-job:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Run Checkov
        uses: bridgecrewio/checkov-action@master
        with:
          directory: terraform/
          framework: terraform
          soft_fail: false # Blocks the PR if security checks fail
          output_format: cli
          download_external_modules: true

With this setup, any Pull Request that introduces a security regression is blocked. This forces developers to either fix the vulnerability or provide a documented justification before the code reaches production.

Lessons from the Front Lines

After implementing Checkov across dozens of enterprise projects, I’ve found that a ‘turn-on-everything’ approach usually fails due to alert fatigue. Here is how to roll it out successfully.

Standardize with a Configuration File

Don’t rely on long CLI flags. Create a .checkov.yaml file in your root directory. This ensures every developer and the CI/CD runner are using the exact same rules, eliminating ‘it worked on my machine’ security issues.

# .checkov.yaml
directory:
  - terraform
  - k8s
skip-check:
  - CKV_AWS_144 # Example: S3 encryption is handled by a global organizational policy
soft-fail: false
output: cli

Handling Exceptions Gracefully

Security rules aren’t always one-size-fits-all. Instead of disabling a rule globally, use inline comments to skip it for a specific resource. I always require a ‘Reason’ comment to maintain an audit trail for future reviews.

# terraform
resource "aws_security_group" "bastion" {
  # checkov:skip=CKV_AWS_24: "Ensure no security groups allow 0.0.0.0/0 to port 22" 
  # Reason: This is a hardened public bastion host specifically for emergency SSH access
  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

The ‘Phased’ Rollout

If you are scanning a legacy project for the first time, you might see 500+ errors. Don’t try to fix them all in one sprint. Use the --check flag to focus only on ‘Critical’ and ‘High’ severity issues first. Once the ‘Criticals’ are zeroed out, you can gradually move the goalposts to include ‘Medium’ and ‘Low’ checks.

checkov -d . --check CRITICAL,HIGH

Final Thoughts

Securing infrastructure is a marathon, not a sprint. By using Checkov, you remove the guesswork and provide your team with immediate, actionable feedback. You aren’t just scanning code; you are building a culture where security is treated with the same rigor as application logic. Start small, block the ‘Criticals’ in your pipeline, and sleep better knowing your cloud defaults aren’t leaving the door wide open.