Bash Shell Scripting for Beginners: Automate Your Way Out of Production Chaos

Table of Contents

It’s 2 AM and Your Server Is on Fire

The alert hits your phone. Disk usage on the production server is at 94%. You SSH in, bleary-eyed, and start manually checking directories, deleting old log files one by one, restarting services. An hour later, you’ve fixed it — but you know it’ll happen again next week.

That night, I promised myself I’d write a script to handle it automatically. Not because I’m lazy, but because humans make mistakes at 2 AM and computers don’t. That was the moment bash scripting went from “something I should learn someday” to “something I need right now.”

This is the guide I wish I had then.

Core Concepts: What Bash Scripting Actually Is

A bash script is a plain text file with a list of shell commands, executed top to bottom. That’s it. The same commands you’d type manually in the terminal — just saved to a file and run on demand.

Your First Script

Create a file called hello.sh:

#!/bin/bash
# This is a comment
echo "Server is alive at: $(date)"
echo "Uptime: $(uptime -p)"

The first line #!/bin/bash is the shebang. It tells the OS which interpreter to use — without it, the system might try running your bash script with sh or another shell, breaking things in subtle ways. Make it executable and run it:

chmod +x hello.sh
./hello.sh

Variables

Variables store values you’ll reuse throughout the script. One iron rule: no spaces around the = sign. NAME="prod" works. NAME = "prod" throws an error. Bash treats NAME as a command and tries to execute it.

#!/bin/bash
SERVER_NAME="prod-web-01"
DISK_THRESHOLD=90

echo "Checking server: $SERVER_NAME"
echo "Alert if disk exceeds: ${DISK_THRESHOLD}%"

Use ${VAR} when the variable is immediately followed by other characters. $DISK_THRESHOLD_MB would silently fail — bash looks for a variable literally named DISK_THRESHOLD_MB. ${DISK_THRESHOLD}_MB gives you what you actually want.

Command Substitution

Capture command output directly into a variable using $():

CURRENT_DISK=$(df / | awk 'NR==2 {print $5}' | tr -d '%')
echo "Current disk usage: $CURRENT_DISK%"

Conditionals

Conditionals turn a script from a simple command playback into something that responds to the actual state of your system. Compare values, check file existence, branch logic:

#!/bin/bash
DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | tr -d '%')
THRESHOLD=85

if [ "$DISK_USAGE" -gt "$THRESHOLD" ]; then
    echo "WARNING: Disk at ${DISK_USAGE}% — cleaning logs"
    find /var/log -name "*.log" -mtime +7 -delete
else
    echo "Disk OK: ${DISK_USAGE}%"
fi

Spaces inside [ ] are required — not optional. Write ["$DISK_USAGE" -gt "$THRESHOLD"] and bash throws a cryptic “command not found” error. The shell needs those spaces to parse the brackets as a test command, not a filename.

Loops

Loops let you repeat actions over a list or while a condition holds:

# For loop — iterate over items
for SERVICE in nginx mysql redis; do
    systemctl is-active --quiet $SERVICE && echo "$SERVICE: running" || echo "$SERVICE: STOPPED"
done

# While loop — run until condition is false
COUNT=0
while [ $COUNT -lt 5 ]; do
    echo "Attempt $COUNT"
    COUNT=$((COUNT + 1))
done

Functions

Without functions, a 10-service health check is 60 lines of copy-pasted code. With functions, it’s 20. Define once, call wherever you need it:

#!/bin/bash

check_service() {
    local SERVICE_NAME=$1
    if systemctl is-active --quiet "$SERVICE_NAME"; then
        echo "[OK] $SERVICE_NAME is running"
    else
        echo "[FAIL] $SERVICE_NAME is down — attempting restart"
        systemctl restart "$SERVICE_NAME"
    fi
}

check_service nginx
check_service mysql
check_service redis

The local keyword scopes the variable inside the function. Without it, SERVICE_NAME leaks into the global scope and can silently clobber other variables — exactly the kind of bug that surfaces at 2 AM, not during testing.

Hands-On Practice: Build a Real Monitoring Script

This is a script I run nightly via cron on a production Ubuntu 22.04 server with 4GB RAM. Before I had it, tracking down a disk issue meant 20 minutes of manual investigation. Now it logs automatically and I find out about problems in the morning, not at midnight.

The Health Check Script

#!/bin/bash
# server-health.sh — production health check
# Usage: ./server-health.sh [--notify]

set -euo pipefail

# Config
HOSTNAME=$(hostname)
ALERT_EMAIL="[email protected]"
DISK_WARN=85
RAM_WARN=90
LOG_FILE="/var/log/health-check.log"

# Logging helper
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

# Check disk usage
check_disk() {
    local USAGE
    USAGE=$(df / | awk 'NR==2 {print $5}' | tr -d '%')
    if [ "$USAGE" -gt "$DISK_WARN" ]; then
        log "WARN: Disk at ${USAGE}%"
        return 1
    fi
    log "OK: Disk at ${USAGE}%"
    return 0
}

# Check RAM usage
check_ram() {
    local TOTAL FREE USAGE
    TOTAL=$(free | awk '/Mem:/ {print $2}')
    FREE=$(free | awk '/Mem:/ {print $4}')
    USAGE=$(( (TOTAL - FREE) * 100 / TOTAL ))
    if [ "$USAGE" -gt "$RAM_WARN" ]; then
        log "WARN: RAM at ${USAGE}%"
        return 1
    fi
    log "OK: RAM at ${USAGE}%"
    return 0
}

# Check critical services
check_services() {
    local FAILED=0
    for SVC in nginx mysql; do
        if ! systemctl is-active --quiet "$SVC"; then
            log "CRITICAL: $SVC is not running"
            FAILED=$((FAILED + 1))
        else
            log "OK: $SVC running"
        fi
    done
    return $FAILED
}

# Run all checks
log "=== Health check on $HOSTNAME ==="
ALL_OK=true

check_disk || ALL_OK=false
check_ram || ALL_OK=false
check_services || ALL_OK=false

if [ "$ALL_OK" = false ]; then
    log "Health check FAILED — review logs"
    exit 1
fi

log "All checks passed"

Running It Automatically with Cron

Save the script, make it executable, then schedule it:

chmod +x /opt/scripts/server-health.sh

# Open crontab editor
crontab -e

# Run every 30 minutes
*/30 * * * * /opt/scripts/server-health.sh >> /var/log/health-check.log 2>&1

Debugging When Things Break

Scripts fail. Here’s how to figure out why:

# Run with verbose output — shows each command before executing
bash -x ./server-health.sh

# Check exit code of last command
echo $?

# Add to top of script for strict error handling:
set -euo pipefail
# -e: exit on error
# -u: error on undefined variable
# -o pipefail: catch errors in pipes

set -euo pipefail belongs at the top of every production script. Without it, a script can silently fail halfway through and leave your system in a half-broken state with no log entry to explain why.

Accepting Arguments

Hard-coded paths make scripts brittle. Positional parameters let you pass values at runtime, making the same script reusable across different directories and environments:

#!/bin/bash
# Usage: ./backup.sh /var/www/html /backups

SOURCE_DIR=$1
DEST_DIR=$2
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

if [ $# -ne 2 ]; then
    echo "Usage: $0 <source> <destination>"
    exit 1
fi

tar -czf "${DEST_DIR}/backup_${TIMESTAMP}.tar.gz" "$SOURCE_DIR"
echo "Backup complete: ${DEST_DIR}/backup_${TIMESTAMP}.tar.gz"

$1 and $2 are the first and second arguments. $# gives the total count. $0 is the script name itself — handy for printing accurate usage messages if someone calls it wrong.

Stop Doing It Manually

Bash scripting isn’t about being clever — it’s about not doing the same thing twice. Every repetitive task you handle manually is a script waiting to be written.

Start small. Pick one thing you do manually every week and automate it. Checking disk space, rotating logs, backing up a directory — doesn’t matter what. Write the script, test it on a non-production machine, then schedule it with cron.

The goal isn’t perfect code. The goal is that next time the disk fills up at 2 AM, your script already handled it before your phone rang.