It’s 2 AM and Your Server Is on Fire
The alert hits your phone. Disk usage on the production server is at 94%. You SSH in, bleary-eyed, and start manually checking directories, deleting old log files one by one, restarting services. An hour later, you’ve fixed it — but you know it’ll happen again next week.
That night, I promised myself I’d write a script to handle it automatically. Not because I’m lazy, but because humans make mistakes at 2 AM and computers don’t. That was the moment bash scripting went from “something I should learn someday” to “something I need right now.”
This is the guide I wish I had then.
Core Concepts: What Bash Scripting Actually Is
A bash script is a plain text file with a list of shell commands, executed top to bottom. That’s it. The same commands you’d type manually in the terminal — just saved to a file and run on demand.
Your First Script
Create a file called hello.sh:
#!/bin/bash
# This is a comment
echo "Server is alive at: $(date)"
echo "Uptime: $(uptime -p)"
The first line #!/bin/bash is the shebang. It tells the OS which interpreter to use — without it, the system might try running your bash script with sh or another shell, breaking things in subtle ways. Make it executable and run it:
chmod +x hello.sh
./hello.sh
Variables
Variables store values you’ll reuse throughout the script. One iron rule: no spaces around the = sign. NAME="prod" works. NAME = "prod" throws an error. Bash treats NAME as a command and tries to execute it.
#!/bin/bash
SERVER_NAME="prod-web-01"
DISK_THRESHOLD=90
echo "Checking server: $SERVER_NAME"
echo "Alert if disk exceeds: ${DISK_THRESHOLD}%"
Use ${VAR} when the variable is immediately followed by other characters. $DISK_THRESHOLD_MB would silently fail — bash looks for a variable literally named DISK_THRESHOLD_MB. ${DISK_THRESHOLD}_MB gives you what you actually want.
Command Substitution
Capture command output directly into a variable using $():
CURRENT_DISK=$(df / | awk 'NR==2 {print $5}' | tr -d '%')
echo "Current disk usage: $CURRENT_DISK%"
Conditionals
Conditionals turn a script from a simple command playback into something that responds to the actual state of your system. Compare values, check file existence, branch logic:
#!/bin/bash
DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | tr -d '%')
THRESHOLD=85
if [ "$DISK_USAGE" -gt "$THRESHOLD" ]; then
echo "WARNING: Disk at ${DISK_USAGE}% — cleaning logs"
find /var/log -name "*.log" -mtime +7 -delete
else
echo "Disk OK: ${DISK_USAGE}%"
fi
Spaces inside [ ] are required — not optional. Write ["$DISK_USAGE" -gt "$THRESHOLD"] and bash throws a cryptic “command not found” error. The shell needs those spaces to parse the brackets as a test command, not a filename.
Loops
Loops let you repeat actions over a list or while a condition holds:
# For loop — iterate over items
for SERVICE in nginx mysql redis; do
systemctl is-active --quiet $SERVICE && echo "$SERVICE: running" || echo "$SERVICE: STOPPED"
done
# While loop — run until condition is false
COUNT=0
while [ $COUNT -lt 5 ]; do
echo "Attempt $COUNT"
COUNT=$((COUNT + 1))
done
Functions
Without functions, a 10-service health check is 60 lines of copy-pasted code. With functions, it’s 20. Define once, call wherever you need it:
#!/bin/bash
check_service() {
local SERVICE_NAME=$1
if systemctl is-active --quiet "$SERVICE_NAME"; then
echo "[OK] $SERVICE_NAME is running"
else
echo "[FAIL] $SERVICE_NAME is down — attempting restart"
systemctl restart "$SERVICE_NAME"
fi
}
check_service nginx
check_service mysql
check_service redis
The local keyword scopes the variable inside the function. Without it, SERVICE_NAME leaks into the global scope and can silently clobber other variables — exactly the kind of bug that surfaces at 2 AM, not during testing.
Hands-On Practice: Build a Real Monitoring Script
This is a script I run nightly via cron on a production Ubuntu 22.04 server with 4GB RAM. Before I had it, tracking down a disk issue meant 20 minutes of manual investigation. Now it logs automatically and I find out about problems in the morning, not at midnight.
The Health Check Script
#!/bin/bash
# server-health.sh — production health check
# Usage: ./server-health.sh [--notify]
set -euo pipefail
# Config
HOSTNAME=$(hostname)
ALERT_EMAIL="[email protected]"
DISK_WARN=85
RAM_WARN=90
LOG_FILE="/var/log/health-check.log"
# Logging helper
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
# Check disk usage
check_disk() {
local USAGE
USAGE=$(df / | awk 'NR==2 {print $5}' | tr -d '%')
if [ "$USAGE" -gt "$DISK_WARN" ]; then
log "WARN: Disk at ${USAGE}%"
return 1
fi
log "OK: Disk at ${USAGE}%"
return 0
}
# Check RAM usage
check_ram() {
local TOTAL FREE USAGE
TOTAL=$(free | awk '/Mem:/ {print $2}')
FREE=$(free | awk '/Mem:/ {print $4}')
USAGE=$(( (TOTAL - FREE) * 100 / TOTAL ))
if [ "$USAGE" -gt "$RAM_WARN" ]; then
log "WARN: RAM at ${USAGE}%"
return 1
fi
log "OK: RAM at ${USAGE}%"
return 0
}
# Check critical services
check_services() {
local FAILED=0
for SVC in nginx mysql; do
if ! systemctl is-active --quiet "$SVC"; then
log "CRITICAL: $SVC is not running"
FAILED=$((FAILED + 1))
else
log "OK: $SVC running"
fi
done
return $FAILED
}
# Run all checks
log "=== Health check on $HOSTNAME ==="
ALL_OK=true
check_disk || ALL_OK=false
check_ram || ALL_OK=false
check_services || ALL_OK=false
if [ "$ALL_OK" = false ]; then
log "Health check FAILED — review logs"
exit 1
fi
log "All checks passed"
Running It Automatically with Cron
Save the script, make it executable, then schedule it:
chmod +x /opt/scripts/server-health.sh
# Open crontab editor
crontab -e
# Run every 30 minutes
*/30 * * * * /opt/scripts/server-health.sh >> /var/log/health-check.log 2>&1
Debugging When Things Break
Scripts fail. Here’s how to figure out why:
# Run with verbose output — shows each command before executing
bash -x ./server-health.sh
# Check exit code of last command
echo $?
# Add to top of script for strict error handling:
set -euo pipefail
# -e: exit on error
# -u: error on undefined variable
# -o pipefail: catch errors in pipes
set -euo pipefail belongs at the top of every production script. Without it, a script can silently fail halfway through and leave your system in a half-broken state with no log entry to explain why.
Accepting Arguments
Hard-coded paths make scripts brittle. Positional parameters let you pass values at runtime, making the same script reusable across different directories and environments:
#!/bin/bash
# Usage: ./backup.sh /var/www/html /backups
SOURCE_DIR=$1
DEST_DIR=$2
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
if [ $# -ne 2 ]; then
echo "Usage: $0 <source> <destination>"
exit 1
fi
tar -czf "${DEST_DIR}/backup_${TIMESTAMP}.tar.gz" "$SOURCE_DIR"
echo "Backup complete: ${DEST_DIR}/backup_${TIMESTAMP}.tar.gz"
$1 and $2 are the first and second arguments. $# gives the total count. $0 is the script name itself — handy for printing accurate usage messages if someone calls it wrong.
Stop Doing It Manually
Bash scripting isn’t about being clever — it’s about not doing the same thing twice. Every repetitive task you handle manually is a script waiting to be written.
Start small. Pick one thing you do manually every week and automate it. Checking disk space, rotating logs, backing up a directory — doesn’t matter what. Write the script, test it on a non-production machine, then schedule it with cron.
The goal isn’t perfect code. The goal is that next time the disk fills up at 2 AM, your script already handled it before your phone rang.

