Ansible Server Configuration Automation: 6 Months of Production Reality

Table of Contents

How I Ended Up Choosing Ansible Over Everything Else

Six months ago, I was managing three Ubuntu servers by hand — SSH in, run a script, hope nothing breaks. When we added a fourth server, I finally hit the wall. Half a day gone because one server was running nginx 1.18 while the others were on 1.24, and a try_files directive behaved differently between versions. That afternoon I committed to proper configuration management.

The contenders: Ansible, Puppet, Chef, and SaltStack. After a weekend of actually standing each one up on a test VM (not just reading docs), I went with Ansible. I’ve been running it in production every week since. Here’s what six months looks like.

Approach Comparison: Ansible vs The Alternatives

Reading blog posts only gets you so far. The differences become apparent the moment you try to configure something real, which is why I spent time hands-on with each tool before deciding.

Manual Bash Scripts

Bash scripts are the baseline everyone starts with. One server? Fine. Add a second and the cracks appear immediately — no idempotency, no state tracking, no clean rollback. A script that fails at step 7 of 12 leaves you debugging a half-configured machine with no clear path back. I still use bash for simple one-off tasks, but it’s not configuration management.

Puppet and Chef

Both are mature, battle-tested tools. The problem is what comes with them. Puppet has its own DSL you need to learn. Chef requires Ruby. Both need a dedicated master server running 24/7. For a solo engineer or a two-person team managing 15 servers, standing up Puppet infrastructure takes a full day — and then keeping it healthy is its own ongoing job. I’d rather spend an afternoon on a playbook.

Terraform

Terraform is excellent, just at a different layer. It handles infrastructure provisioning: spinning up VMs, creating databases, configuring VPC networks. What happens inside those VMs after they boot is a different problem entirely. I use Terraform and Ansible together. One creates the server; the other configures it.

Ansible

No agent to install on managed hosts. No master server to maintain. YAML playbooks that read almost like a checklist. SSH does the heavy lifting. For anyone already comfortable with Linux and SSH, picking up Ansible feels like extending what you already know rather than learning a foreign system from scratch.

Once it clicked for me, I cut a 3-hour server setup process down to about 12 minutes. That kind of time saving stacks up fast when you’re provisioning multiple servers a month.

Pros and Cons After Six Months

What Actually Works Well

Agentless architecture — Nothing to install on managed hosts. SSH access is all you need. Rolling this out across ten existing servers took under an hour, with zero changes on the servers themselves.
Idempotency — Run the same playbook ten times, same result. This makes it safe to include playbook runs in CI/CD pipelines without worrying about side effects on the 4th or 5th execution.
Readable playbooks — I handed a playbook to a junior sysadmin who’d never touched Ansible. He read it top to bottom without any explanation from me. YAML descriptions map closely to plain English in a way that Puppet DSL or Chef recipes simply don’t.
Massive module library — Package installation, user management, file templates, service management, Docker, AWS, GCP — there’s a module for almost everything. In six months of daily use, I’ve needed raw shell commands maybe five times.
Inventory flexibility — Static files work fine for small fixed setups. Switch to a dynamic inventory plugin in AWS or GCP and servers come and go without any manual file editing.

Where Ansible Falls Short

Speed at scale — Sequential by default. A playbook across 50 servers takes noticeably longer than tools with native parallel execution. The forks setting helps — I use forks = 20 in production — but at 200+ servers this becomes a real bottleneck.
Error messages — When a task fails, the output can be genuinely cryptic. My usual workflow: add -vvv, pipe output to a file, search for “FAILED” or “fatal”. Not elegant, but it works.
No built-in drift detection — Ansible doesn’t continuously monitor servers for configuration drift. If someone manually edits /etc/nginx/nginx.conf between playbook runs, Ansible has no idea until you run again. Puppet handles this natively, which is a real gap.
Variable precedence complexity — Ansible has 22 levels of variable precedence. Once you’re juggling group vars, host vars, role defaults, and extra vars at the same time, figuring out which value actually wins requires checking the documentation every time.

Recommended Setup for Small to Mid-Size Teams

After six months of iteration, here’s the structure I’d recommend starting with. It scales cleanly from 3 servers to 30 without a major reorganization.

ansible/
├── inventory/
│   ├── production
│   └── staging
├── group_vars/
│   ├── all.yml
│   ├── webservers.yml
│   └── dbservers.yml
├── roles/
│   ├── common/
│   │   ├── tasks/main.yml
│   │   └── templates/
│   ├── nginx/
│   └── docker/
├── playbooks/
│   ├── site.yml
│   ├── webservers.yml
│   └── deploy.yml
└── ansible.cfg

Three decisions that matter most here:

Separate inventory files for production and staging. The one time I ran a destructive playbook against production instead of staging taught me this lesson permanently.
Roles for anything you’ll reuse — nginx setup, Docker installation, common security hardening. Roles are self-contained, testable, and easy to share across projects.
Group vars to keep environment-specific config separate from playbook logic. Production uses port 443; staging uses 8443. That difference belongs in vars, not hardcoded in the playbook itself.

A minimal but sensible ansible.cfg:

[defaults]
inventory = inventory/production
remote_user = ubuntu
private_key_file = ~/.ssh/id_ed25519
host_key_checking = False
forks = 10
callbacks_enabled = profile_tasks

[privilege_escalation]
become = True
become_method = sudo

Implementation Guide: From Zero to Running Playbook

Step 1: Install Ansible

# On Ubuntu/Debian control machine
sudo apt update && sudo apt install -y ansible

# Verify installation
ansible --version

Step 2: Create Your Inventory

The inventory file tells Ansible which servers to manage and how to group them. Keep it simple at first.

# inventory/production
[webservers]
web01 ansible_host=192.168.1.10
web02 ansible_host=192.168.1.11

[dbservers]
db01 ansible_host=192.168.1.20

[all:vars]
ansible_user=ubuntu
ansible_ssh_private_key_file=~/.ssh/id_ed25519

Before writing a single playbook, confirm connectivity:

ansible all -i inventory/production -m ping

Three green pong responses and you’re in business.

Step 3: Write Your First Playbook

Start concrete. A baseline configuration applied to every server is the highest-value first playbook — it runs automatically on new servers and keeps existing ones consistent.

# playbooks/site.yml
---
- name: Apply common configuration to all servers
  hosts: all
  become: true

  tasks:
    - name: Update apt package cache
      apt:
        update_cache: yes
        cache_valid_time: 3600

    - name: Install essential packages
      apt:
        name:
          - vim
          - curl
          - htop
          - ufw
          - fail2ban
        state: present

    - name: Enable UFW firewall
      ufw:
        state: enabled
        policy: deny

    - name: Allow SSH through firewall
      ufw:
        rule: allow
        port: '22'
        proto: tcp

    - name: Ensure fail2ban is running
      service:
        name: fail2ban
        state: started
        enabled: yes

Step 4: Run the Playbook

# Dry run first — see what would change without making changes
ansible-playbook playbooks/site.yml --check

# Apply changes
ansible-playbook playbooks/site.yml

# Limit to specific hosts
ansible-playbook playbooks/site.yml --limit web01

# Run with verbose output for debugging
ansible-playbook playbooks/site.yml -v

Step 5: Organize with Roles

Playbooks grow fast. Once yours passes 50 lines, start extracting repeated logic into roles.

# Create a role scaffold
ansible-galaxy init roles/nginx

A minimal nginx role task file:

# roles/nginx/tasks/main.yml
---
- name: Install nginx
  apt:
    name: nginx
    state: present

- name: Copy nginx configuration
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: '0644'
  notify: Restart nginx

- name: Ensure nginx is running
  service:
    name: nginx
    state: started
    enabled: yes

Then reference it in your playbook:

- name: Configure web servers
  hosts: webservers
  become: true
  roles:
    - common
    - nginx

One Thing I’d Tell Myself Six Months Ago

Set up a staging environment that mirrors production exactly, and run every playbook there first. I learned this the hard way — I tested a UFW firewall playbook directly on a production server, locked myself out, and spent forty minutes recovering access through the VPS console. --check mode catches most issues, but not all of them. A throwaway VM is worth the five minutes it takes to spin up.

Commit your Ansible code to Git from day one. Playbooks are infrastructure as code — treat them that way. git diff before applying changes, git blame when something breaks. I’ve caught configuration mistakes this way that I’d have completely forgotten about otherwise.

The learning curve is shorter than you’d expect. My first week, I automated the server setup I’d been doing manually for a year. By week four, I’d stopped SSHing into individual servers for routine tasks entirely. Two months in, onboarding a new server went from a half-day job to a coffee break.