The Shift to Serverless Containers
Managing EC2 instances for container orchestration often feels like a never-ending cycle of 2 AM pager alerts. You have to patch the OS, monitor disk space, and tweak scaling groups just to keep your Docker containers breathing.
When I migrated my first cluster from self-managed EC2 to AWS ECS Fargate, the operational burden vanished almost overnight. Fargate lets you run containers without touching a single server. When you pair it with Terraform, you get a repeatable, version-controlled environment that works every time.
I’ve found that mastering this stack is the fastest way to build production-ready systems. You get to focus on your code while AWS handles the heavy lifting of infrastructure maintenance. This guide skips the fluff and shows you how to build a functional, auto-scaling Fargate service from scratch.
Quick Start: The 5-Minute Cluster
Everything starts with an ECS cluster. Think of this as your logical sandbox. Unlike traditional clusters, a Fargate-backed cluster doesn’t require you to provision or pay for underlying EC2 capacity upfront.
# provider.tf
provider "aws" {
region = "us-east-1"
}
# cluster.tf
resource "aws_ecs_cluster" "main" {
name = "production-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
}
Fire off terraform init and terraform apply. You now have a namespace ready for action. But a cluster alone is just an empty shell. We need networking and task definitions to actually run a workload.
Building the Infrastructure Foundation
A production-grade Fargate setup relies on three pillars: Networking, IAM Roles, and the Task Definition.
1. Networking for Fargate
Fargate tasks must live inside a VPC. For a secure setup, place your tasks in private subnets. Use an Application Load Balancer (ALB) in public subnets to handle incoming traffic. Note: Since your tasks are in private subnets, you will need a NAT Gateway or VPC Endpoints to pull images from ECR.
# Simplified VPC setup
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
}
resource "aws_security_group" "ecs_tasks" {
name = "ecs-tasks-sg"
vpc_id = aws_vpc.main.id
ingress {
protocol = "tcp"
from_port = 80
to_port = 80
security_groups = [aws_security_group.alb.id]
}
egress {
protocol = "-1"
from_port = 0
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
}
}
2. IAM Roles: Execution vs. Task Role
This is where many engineers get stuck. You need two distinct roles to make this work. The Execution Role is for the ECS agent; it pulls your image and sends logs to CloudWatch. The Task Role is for your application code, allowing it to talk to services like S3 or DynamoDB.
resource "aws_iam_role" "ecs_task_execution_role" {
name = "ecs-task-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ecs-tasks.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy_attachment" "ecs_execution_standard" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
3. The Task Definition and Service
The Task Definition is your container’s DNA. It defines the image, CPU (e.g., 256 for 0.25 vCPU), and memory. The Service then acts as a manager, ensuring your desired number of tasks stay healthy.
resource "aws_ecs_task_definition" "app" {
family = "my-app"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "256"
memory = "512"
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
container_definitions = jsonencode([
{
name = "app-container"
image = "nginx:latest"
essential = true
portMappings = [{
containerPort = 80
hostPort = 80
}]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/my-app"
"awslogs-region" = "us-east-1"
"awslogs-stream-prefix" = "ecs"
}
}
}
])
}
resource "aws_ecs_service" "main" {
name = "my-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = 2
launch_type = "FARGATE"
network_configuration {
security_groups = [aws_security_group.ecs_tasks.id]
subnets = [aws_subnet.private.id]
}
}
Scaling for Peak Demand
Static task counts fail during traffic spikes. If your app gets featured on a major news site, you need to scale instantly. AWS Application Auto Scaling adjusts your desired_count based on real-time metrics.
Target tracking is the smartest approach here. It acts like a thermostat for your infrastructure, adding capacity when things get hot and cooling down when traffic drops.
resource "aws_appautoscaling_policy" "ecs_policy_cpu" {
name = "cpu-autoscaling"
policy_type = "TargetTrackingScaling"
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.main.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 70.0
}
}
If average CPU usage hits 70%, ECS will spin up more tasks. When the load subsides, it gracefully terminates the extras to protect your budget.
Hard-Won Lessons from the Field
Running Fargate in production for several years has taught me a few critical lessons that don’t always appear in the documentation.
The ‘:latest’ Tag Trap
Terraform won’t detect a change if you simply push a new image to the same :latest tag. Set force_new_deployment = true in your aws_ecs_service. This forces a rollout every time you run terraform apply, ensuring your latest code actually reaches production.
Visibility is Everything
You cannot SSH into a Fargate container. Logs are your only lifeline. Always configure the awslogs driver. Without it, debugging a 500 error becomes a guessing game instead of a technical process.
Handle SIGTERM Gracefully
When ECS deploys a new version, it sends a SIGTERM to the old containers. Your application has exactly 30 seconds to finish its current request before AWS kills the process. If your app handles long-running jobs, increase the stopTimeout in your container definition to avoid data corruption.
Slash Costs with Fargate Spot
Fargate can get expensive if you run large clusters 24/7. For development environments or non-critical background workers, use Fargate Spot. It allows you to use spare AWS capacity for a 70% discount, provided you can handle a two-minute termination notice.
Terraform boilerplate can feel heavy at first. However, the trade-off is a rock-solid environment that scales without manual intervention. Once your HCL files are ready, launching a new microservice takes minutes, not days.

