When the Disk Fills Up at 2 AM
You get a midnight alert: your web application is throwing 500 errors. SSH in, and the first thing you see is No space left on device. The database won’t write. Nginx can’t create temp files. Everything is broken — because someone (maybe a runaway log process, maybe you) filled the disk.
I’ve been there — 3 AM, fingers flying, cursing a process I didn’t even know was running. Three years and a dozen VPS instances later, I now set up disk monitoring on every new server before anything else runs. But when the alert is already firing, here’s what actually works.
Understanding Why Disks Fill Up
Before jumping to rm -rf everything, it helps to understand what actually eats disk space on a Linux server. The usual suspects:
- Log files — Application logs, system logs, and journal files that grow unbounded
- Docker — Unused images, stopped containers, and dangling volumes accumulate silently
- Package cache — apt/yum caches old packages after updates
- Temporary files — Some apps write to
/tmpor/var/tmpand never clean up - Database dumps and backups — Old backup files that nobody deleted
- Core dumps — A crashing process can dump gigabytes of memory to disk
Here’s a gotcha that burns people constantly: df says the disk is full, but du can’t find the files taking space. This usually means a deleted file is still held open by a running process — the inode stays alive, the space stays consumed. It doesn’t free until that process closes the file handle. Classic trap.
Core Commands You Need to Know
Check Overall Disk Usage
# Human-readable disk space overview
df -h
# Focus on a specific filesystem
df -h /var
Find What’s Eating Space
# Top-level disk usage in /var, sorted by size
du -sh /var/* 2>/dev/null | sort -rh | head -20
# Drill down further
du -sh /var/log/* 2>/dev/null | sort -rh | head -10
The sort -rh flag sorts human-readable sizes correctly — 1.2G sorts above 900M. Drop the -h and you’ll get garbage ordering, chasing the wrong directories while your disk stays full.
Find Large Files Directly
# Find files larger than 100MB anywhere on the system
find / -xdev -type f -size +100M 2>/dev/null | sort -k5 -rn
# More readable version with sizes
find / -xdev -type f -size +100M -exec du -sh {} \; 2>/dev/null | sort -rh
The -xdev flag keeps the search within the current filesystem and doesn’t cross mount points — useful when you want to stay on / and not accidentally traverse NFS mounts.
Hands-On: Recovering Space Quickly
1. Clean Up Journal Logs
On systemd-based distros (Ubuntu, Debian, CentOS 7+, AlmaLinux), journald can quietly reach 2–5GB on a busy server. You probably won’t notice until the disk is gone.
# Check how much journal is using
journalctl --disk-usage
# Keep only the last 200MB of logs
journalctl --vacuum-size=200M
# Or keep only logs from the last 2 weeks
journalctl --vacuum-time=2weeks
Cap it permanently in /etc/systemd/journald.conf:
SystemMaxUse=500M
SystemKeepFree=1G
Then restart journald: systemctl restart systemd-journald
2. Truncate (Not Delete) Active Log Files
If a process has a log file open, deleting it won’t free space — the inode stays alive. Truncating it does work:
# Safely truncate a log file that's actively being written
> /var/log/nginx/access.log
# Equivalent using truncate command
truncate -s 0 /var/log/nginx/error.log
For the deleted-but-still-open file problem, find the process holding the handle:
# Find processes with deleted files still open
lsof | grep '(deleted)'
Restarting that process releases the handle and actually frees the space.
3. Clean Docker Artifacts
On servers running Docker, this is often the biggest win:
# See what Docker is using
docker system df
# Remove stopped containers, unused networks, dangling images, build cache
docker system prune
# Also remove unused volumes (careful — double-check first!)
docker system prune --volumes
# Remove only dangling images
docker image prune
# Remove all images not used by any container
docker image prune -a
I’ve recovered 20–40GB in a single docker system prune -a on build servers that hadn’t been cleaned in months. Just make sure no containers are actively using images you’re about to remove.
4. Clean Package Manager Cache
# Debian/Ubuntu
apt clean # Removes cached .deb packages
apt autoremove # Removes orphaned packages
# RHEL/CentOS/AlmaLinux
yum clean all
dnf clean all
On a recently updated Ubuntu server, apt clean alone can recover 500MB–1GB. Not dramatic, but free space is free space — and it takes two seconds.
5. Find and Handle Core Dumps
# Check for core dumps
ls -lh /var/lib/systemd/coredump/
# Clean them out
coredumpctl list
rm /var/lib/systemd/coredump/*
# Or disable core dumps entirely in /etc/security/limits.conf
# * hard core 0
6. Check Inode Usage
Sometimes df -h shows plenty of space but writes still fail. The culprit is often inodes, not bytes:
df -i
# Shows inode usage % per filesystem
# If Use% is at 100%, you've run out of inodes
Millions of tiny files can exhaust the inode table long before disk bytes run out. Mailer queues, PHP session files, and application caches are the usual offenders. Find which directory is responsible:
# Find directories with the most files
for i in /var/*; do echo $(find $i -type f 2>/dev/null | wc -l) $i; done | sort -rn | head -10
Setting Up Monitoring So You Catch This Early
The real lesson from every disk-full incident isn’t “how do I fix it” — it’s “why didn’t I know sooner.” A simple cron job alerting at 80% costs you 10 minutes to set up and saves you from exactly this situation:
#!/bin/bash
# /usr/local/bin/disk-check.sh
THRESHOLD=80
USAGE=$(df / | awk 'NR==2 {print $5}' | tr -d '%')
if [ "$USAGE" -ge "$THRESHOLD" ]; then
echo "ALERT: Disk usage on $(hostname) is at ${USAGE}%" | \
mail -s "Disk Warning: $(hostname)" [email protected]
fi
# Add to crontab to check every hour
crontab -e
# Add: 0 * * * * /usr/local/bin/disk-check.sh
For multiple servers, Netdata, Prometheus + Grafana, or Zabbix agents handle disk monitoring at scale. Single VPS? The cron script above is enough. Ship it.
A Checklist for When You’re Under Fire
When the alarm is going off and you need space now, run through this in order:
- Run
df -hto confirm which filesystem is full - Run
du -sh /var/* | sort -rh | head -20to find the biggest directories - Check for deleted-but-open files:
lsof | grep deleted - Truncate or rotate the largest active log file
- Run
journalctl --vacuum-size=200M - If Docker is present:
docker system prune - Run
apt cleanordnf clean all - Check inode usage:
df -i
Steps 2–5 alone usually recover enough space to stabilize the system. Then you can breathe, investigate the root cause, and fix it properly.
Preventing the Next Incident
Every disk-full crisis I’ve seen followed the same pattern: something grew unchecked for weeks, nobody noticed, then everything broke at once. One-time cleanups don’t fix that. Permanent changes do:
- Configure
logrotatefor all application logs — check/etc/logrotate.d/for existing configs and add any missing ones - Set
SystemMaxUsein journald config - Run
docker system pruneweekly via cron on build/CI servers - Add disk usage to your monitoring stack — whatever you use, make sure it pages you at 80%
- Review backup retention policies — old backups sitting on the same disk as your application is a disaster waiting to happen
Disk space management isn’t glamorous. But 30 minutes of setup today prevents a 2 AM firefight six months from now. Trust me on that one.

