Stop Copy-Pasting Your Infrastructure: A Practical Guide to Terragrunt

Table of Contents

The Real-World Headache of Vanilla Terraform

Terraform is a joy to use—at first. You write a few resource blocks, run terraform apply, and watch your infrastructure appear in the cloud. But the honeymoon ends when your project scales. Once you move from a single sandbox to managing Development, Staging, and Production, you hit a wall of manual repetition.

Suddenly, you are copy-pasting the same backend configurations, provider blocks, and module calls into dozens of folders. If you manage five environments with ten modules each, you now have 50 backend.tf files to maintain.

This duplication is a maintenance nightmare. If you need to update a provider version or a global tag, you have to remember to change it in every single directory. Forget just one, and your environments drift apart, leading to “it works in Dev” bugs that haunt your production releases.

Comparing the Two Worlds

To see why Terragrunt has become a staple in modern DevOps, let’s look at how it changes the way we structure our code compared to standard Terraform.

The Standard Terraform Setup

In a typical vanilla setup, your folder structure likely looks like this:

infrastructure/
├── dev/
│   ├── main.tf
│   ├── variables.tf
│   └── backend.tf
├── prod/
│   ├── main.tf
│   ├── variables.tf
│   └── backend.tf
└── modules/
    └── vpc/

Notice the redundancy? The dev/backend.tf and prod/backend.tf files are virtually identical, save for a single S3 bucket key. Your main.tf files are usually just wrappers that call the same module with slightly different variables. This violates the DRY (Don’t Repeat Yourself) principle and invites human error.

The Terragrunt Alternative

Terragrunt acts as a thin, intelligent wrapper for Terraform. Instead of defining backends and providers in every sub-folder, you define them once in a root configuration file. Your environment folders then contain only a terragrunt.hcl file. This file simply references your common logic and provides specific inputs. Terragrunt handles the tedious task of generating backend configurations and passing variables on the fly.

The Trade-offs: Is Terragrunt Worth It?

Adding a tool to your stack is never free. It is important to weigh the operational benefits against the added layer of abstraction.

The Benefits

DRY Backend Logic: You define your S3 or GCS backend once. Terragrunt automatically creates the bucket if it’s missing and manages state keys based on your folder structure.
Centralized Providers: Define your AWS or Azure provider in a single file and let every module inherit those settings.
Smart Dependency Management: Does your App module need a VPC ID? Terragrunt allows you to define dependencies explicitly. It ensures resources are created in the correct order every time.
Mass Execution: Commands like terragrunt run-all plan allow you to preview changes across 20+ modules simultaneously. This can save a DevOps engineer 30 minutes of manual navigation per deployment cycle.

The Drawbacks

Tooling Overhead: It is another binary to install and update in your CI/CD pipelines (like GitHub Actions or GitLab CI).
Initial Learning Curve: New team members must learn both Terraform and the Terragrunt HCL syntax, which can be confusing at first.
Abstraction Layers: When a deployment fails, you have to check if the bug is in your Terraform module or your Terragrunt orchestration logic.

A Professional Project Structure

In production environments, I recommend separating your “infrastructure code” (the reusable modules) from your “live configuration” (the actual deployments). This keeps your logic clean and your deployments predictable.

my-repo/
├── modules/             # Reusable Terraform modules (The 'What')
│   ├── vpc/
│   └── ec2/
└── live/                # Environment-specific configs (The 'Where')
    ├── terragrunt.hcl   # Root configuration
    ├── dev/
    │   └── vpc/
    │       └── terragrunt.hcl
    └── prod/
        └── vpc/
            └── terragrunt.hcl

How to Implement Terragrunt Today

Let’s walk through a practical implementation to manage a VPC across two environments without the usual code bloat.

Step 1: The Root Configuration

Create a live/terragrunt.hcl file to handle your remote state. Terragrunt uses this to generate a backend.tf for every sub-folder automatically.

# live/terragrunt.hcl
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket = "my-company-terraform-state"
    key = "${path_relative_to_include()}/terraform.tfstate"
    region = "us-east-1"
    encrypt = true
    dynamodb_table = "terraform-lock-table"
  }
}

The path_relative_to_include() function is a lifesaver. If you are working in live/dev/vpc, it automatically sets your S3 key to dev/vpc/terraform.tfstate. No more manual path errors.

Step 2: Define Your Environment

Your live/dev/vpc/terragrunt.hcl file now stays incredibly lean. It only needs to point to the module and provide the inputs.

# live/dev/vpc/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "../../../modules/vpc"
}

inputs = {
  env           = "dev"
  vpc_cidr      = "10.0.0.0/16"
  enable_nat_gw = false
}

Step 3: Connecting the Dots with Dependencies

This is where Terragrunt outshines vanilla Terraform. If your EC2 instance needs the VPC ID, you don’t need to hardcode it or use complex data sources.

# live/dev/ec2/terragrunt.hcl
dependency "vpc" {
  config_path = "../vpc"
}

inputs = {
  vpc_id    = dependency.vpc.outputs.vpc_id
  subnet_id = dependency.vpc.outputs.public_subnets[0]
  instance_type = "t3.micro"
}

Terragrunt fetches the outputs from the VPC module and injects them into the EC2 module. If the VPC hasn’t been deployed yet, Terragrunt will notify you or deploy it automatically if you use the run-all command.

Scaling Your IaC Without Losing Your Sanity

Early in my career, I thought standard Terraform was enough for any task. However, as the number of microservices grew, the “copy-paste” method led to several production outages caused by mismatched subnet IDs. Switching to Terragrunt eliminated that entire class of errors by enforcing a single source of truth.

If you are part of a team or managing more than one environment, Terragrunt isn’t just a “nice-to-have” utility. It is an essential tool for keeping your code clean and your deployments predictable. Start by migrating your backend configuration first. You will notice the difference in your daily workflow almost immediately.