The Nightmare of Ballooning Backup Costs
Early in my career, I managed a small cluster of web servers. Every night, a simple cron job ran a tar command to zip up web directories and database dumps, shipping them to a remote storage server. It worked perfectly for about three weeks. Then, the storage bill arrived.
Even though the actual website data only grew by maybe 50MB a day, my storage usage was skyrocketing by 20GB every night. I was paying to store the same images, the same library files, and the same database structures over and over again. When I finally needed to restore a single 10KB config file from a 50GB archive, I had to download the entire block. Over a throttled connection, that took three hours of downtime I couldn’t afford.
The Problem with File-Level Backups
Standard tools like rsync or basic tar scripts handle data inefficiently because they look at files as whole units. Imagine you have a 2GB database dump. If you change just one row of text, rsync sees the file has changed and copies the entire 2GB again. Over a month, that’s 60GB of storage for what is essentially the same data.
Security is another major headache. Most basic scripts don’t encrypt data by default. If your backup server is compromised, your raw customer data is out in the open. Managing retention—deciding which old backups to delete without breaking the chain—usually requires writing complex, error-prone scripts. You need a system that understands data at a granular level, not just as a collection of files.
Choosing the Right Tool
Several players dominate the Linux backup ecosystem, each serving a different purpose:
- Rsync: This is the gold standard for mirroring directories. However, it doesn’t provide versioning or deduplication out of the box, leading to high storage costs for long-term history.
- Rclone: Think of this as the Swiss Army knife for cloud storage. It’s excellent for moving data to S3 or Google Drive, but it’s a sync tool, not a dedicated archive manager.
- BorgBackup (Borg): This tool uses chunk-level deduplication. It breaks files into small, variable-sized pieces and only stores unique chunks. If ten different files contain the same 1MB block of code, Borg stores it exactly once.
BorgBackup stands out because it packs deduplication, authenticated AES-256 encryption, and compression into a single, fast binary. In my experience, switching from tar to Borg often reduces storage requirements by 70% to 90%.
Step-by-Step: Implementing BorgBackup
After managing 10+ Linux VPS instances over the last few years, I’ve refined a workflow that balances security and ease of use. Here is how to get it running.
1. Installing BorgBackup
Most modern Linux distributions carry Borg in their official repos. On Ubuntu or Debian, it’s a quick install:
sudo apt update
sudo apt install borgbackup -y
For RHEL-based systems like AlmaLinux or Fedora, use dnf:
sudo dnf install borgbackup -y
2. Initializing the Backup Repository
Borg stores data in a “repository,” which is just a directory where it manages its encrypted chunks. You only need to initialize this once. Always use encryption—it’s better to have it and not need it than the other way around.
# Create a directory for backups
mkdir -p /mnt/backups/my_server_repo
# Initialize the repo with encryption
borg init --encryption=repokey /mnt/backups/my_server_repo
Borg will ask for a passphrase. Do not lose this. If you lose this password, your backups are effectively randomized noise that can never be recovered. Store it in a password manager like Bitwarden or 1Password immediately.
3. Creating Your First Backup
Creating an archive is simple. You name the archive (I recommend using a timestamp) and point it at your data.
# Syntax: borg create /path/to/repo::ArchiveName /path/to/data
borg create --stats --progress /mnt/backups/my_server_repo::Monday-Backup /var/www/html /etc/nginx
The --stats flag is vital for your peace of mind. It shows you the “Original size” vs. the “Deduplicated size,” letting you see exactly how much space you’re saving in real-time.
4. Browsing and Restoring Data
The real magic happens when you need to retrieve data. Instead of extracting a massive archive, you can mount the backup as a virtual filesystem. This lets you browse files using ls or cp as if they were just sitting on your hard drive.
mkdir /tmp/restore_point
borg mount /mnt/backups/my_server_repo::Monday-Backup /tmp/restore_point
# Now you can browse /tmp/restore_point and copy only what you need.
# When finished, unmount it:
borg umount /tmp/restore_point
5. Automating the Process
Manual backups are a recipe for disaster. You need a script that handles the heavy lifting and cleans up after itself. Here is a production-ready template:
#!/bin/bash
export BORG_REPO='/mnt/backups/my_server_repo'
export BORG_PASSPHRASE='your_secure_passphrase_here'
# Create a new backup with a precise timestamp
echo "Starting backup..."
borg create ::"$(date +%Y-%m-%d-%H%M)" /var/www/html /etc
# Keep the last 7 daily, 4 weekly, and 6 monthly backups
echo "Pruning old backups..."
borg prune -v --list --keep-daily=7 --keep-weekly=4 --keep-monthly=6
# Free up space from deleted chunks
borg compact
echo "Backup completed successfully."
Save this as /usr/local/bin/backup.sh, make it executable with chmod +x, and set a cron job to run it at 2:00 AM daily:
0 2 * * * /usr/local/bin/backup.sh >> /var/log/borg-backup.log 2>&1
Hard-Won Lessons for Production
Setting up the script is only half the battle. I learned the hard way that a backup you haven’t tested is just a dream. Run borg check once a month to ensure your storage hardware hasn’t corrupted the underlying data chunks.
Also, remember the 3-2-1 rule. If your server’s disk dies and your Borg repo is on that same disk, you have nothing. After my Borg script finishes, I use rclone to sync the entire repository to an off-site S3 bucket. Because Borg encrypts everything locally, your data remains private even if it’s sitting on a third-party cloud provider’s server.
The Bottom Line
BorgBackup eliminates the “storage tax” of traditional backups by focusing on unique data chunks rather than whole files. By implementing deduplication, you can keep a deep history of your server’s state for a fraction of the usual cost. Start by setting up a local repo, test your restore process twice, and then automate it to keep your data safe without draining your budget.

