PostgreSQL Point-in-Time Recovery (PITR): Restore Your Database to Any Exact Second

Database tutorial - IT technology blog
Database tutorial - IT technology blog

Why PITR Saved Me (And Why You Need It Before Disaster Strikes)

A few years back, I was working on a SaaS project running PostgreSQL. One of the developers ran an UPDATE without a WHERE clause — the classic nightmare. All 80,000 user records had their status column set to 'inactive'. The last full backup was 6 hours old. We lost 6 hours of real user activity.

That incident pushed me to finally set up Point-in-Time Recovery properly. I’ve worked with MySQL, PostgreSQL, and MongoDB across different projects. Each has its own strengths — but PostgreSQL’s WAL-based PITR stands out as one of the most reliable recovery mechanisms I’ve seen in any RDBMS. It lets you roll back to any specific second, not just to the last backup checkpoint.

Here’s how to set it up correctly, with the lessons I learned the hard way.

What PITR Actually Does Under the Hood

PostgreSQL writes every change to a Write-Ahead Log (WAL) before applying it to data files. This log is sequential, append-only, and extremely durable. PITR works by:

  1. Taking a base backup — a consistent snapshot of your data directory.
  2. Continuously archiving WAL segments to a safe location.
  3. At recovery time, replaying WAL segments from the base backup up to your target timestamp.

You can restore to any point between your base backup and your latest archived WAL — down to the second. That’s fundamentally different from a nightly pg_dump. A dump gives you a snapshot. PITR gives you a time machine.

Installation & Prerequisites

PITR is built into PostgreSQL — no extra packages needed. You just need:

  • PostgreSQL 12+ (examples below use 14/15, but the concepts apply to 12+)
  • A separate storage location for WAL archives (local path, NFS, S3, or similar)
  • Enough disk space: WAL segments are 16MB each and pile up fast on busy databases

For this guide, I’ll archive to a local directory (/var/lib/postgresql/wal_archive). In production, use object storage like S3 or a dedicated backup server — never the same disk as your data.

# Create the archive directory
sudo mkdir -p /var/lib/postgresql/wal_archive
sudo chown postgres:postgres /var/lib/postgresql/wal_archive
sudo chmod 700 /var/lib/postgresql/wal_archive

Configuration: Step by Step

Step 1: Enable WAL Archiving in postgresql.conf

Edit your postgresql.conf (usually at /etc/postgresql/14/main/postgresql.conf on Debian/Ubuntu):

sudo nano /etc/postgresql/14/main/postgresql.conf

Set these parameters:

# WAL level must be 'replica' or higher for PITR
wal_level = replica

# Enable archiving
archive_mode = on

# The command to copy WAL segments to your archive location
archive_command = 'test ! -f /var/lib/postgresql/wal_archive/%f && cp %p /var/lib/postgresql/wal_archive/%f'

# How long to wait before archiving an incomplete WAL segment
archive_timeout = 60    # seconds — important for low-traffic databases

The archive_command is a plain shell command. PostgreSQL replaces %p with the source WAL file path and %f with the filename. The test ! -f guard prevents overwriting existing archives. Never remove that check.

Reload PostgreSQL to apply:

sudo systemctl reload postgresql

Step 2: Take a Base Backup

PITR needs a base backup as its starting point — there’s no way around this. Use pg_basebackup:

sudo -u postgres pg_basebackup \
  -D /var/lib/postgresql/base_backup \
  -Ft \
  -z \
  -P \
  -Xs \
  -R

Flag breakdown:

  • -D — destination directory for the backup
  • -Ft — tar format (easier to move around)
  • -z — gzip compression
  • -P — show progress
  • -Xs — stream WAL during backup (ensures consistency)
  • -R — write a standby.signal and minimal postgresql.auto.conf (useful for replicas, harmless otherwise)

Schedule this with cron. I run a full base backup every Sunday at 2am, then rely on archived WAL for everything in between:

# Run base backup every Sunday at 2am
0 2 * * 0 postgres pg_basebackup -D /var/lib/postgresql/base_backup_$(date +\%F) -Ft -z -P -Xs 2>&1 >> /var/log/pg_basebackup.log

Step 3: Performing Point-in-Time Recovery

Say someone just ran a catastrophic DELETE at 14:37 JST. You need to restore to 2026-04-24 14:35:00 JST — two minutes before the damage. Here’s the exact procedure.

1. Stop PostgreSQL:

sudo systemctl stop postgresql

2. Back up the current (corrupted) data directory — just in case:

sudo mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main.broken

3. Restore the base backup:

sudo mkdir -p /var/lib/postgresql/14/main
sudo tar -xzf /var/lib/postgresql/base_backup/base.tar.gz \
  -C /var/lib/postgresql/14/main
sudo chown -R postgres:postgres /var/lib/postgresql/14/main

4. Create a recovery configuration:

In PostgreSQL 12+, recovery settings live in postgresql.conf (or postgresql.auto.conf). You also need a recovery.signal file to trigger recovery mode:

# Create the signal file
sudo -u postgres touch /var/lib/postgresql/14/main/recovery.signal

# Add recovery settings
sudo -u postgres tee -a /var/lib/postgresql/14/main/postgresql.auto.conf <<EOF
restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p'
recovery_target_time = '2026-04-24 14:35:00+09'
recovery_target_action = 'promote'
EOF

The restore_command is the mirror of archive_command — it tells PostgreSQL where to fetch each WAL segment during replay.

5. Start PostgreSQL and watch the logs:

sudo systemctl start postgresql
sudo tail -f /var/log/postgresql/postgresql-14-main.log

You’ll see lines like:

LOG:  starting point-in-time recovery to 2026-04-24 14:35:00+09
LOG:  restored log file "000000010000000000000001" from archive
...
LOG:  recovery stopping before commit of transaction 1234567, time 2026-04-24 14:35:02+09
LOG:  pausing at the end of recovery
HINT:  Execute pg_wal_replay_resume() to promote.

Once you’ve verified the data looks right, promote the instance:

sudo -u postgres psql -c "SELECT pg_wal_replay_resume();"

Verification & Monitoring

Verify Archiving Is Working

Don’t assume archiving works — confirm it. Force a WAL switch right after enabling archiving and check that files actually land in the archive directory:

# Force a WAL switch to trigger archiving immediately
sudo -u postgres psql -c "SELECT pg_switch_wal();"

# Check the archive directory
ls -lh /var/lib/postgresql/wal_archive/

# Check archive status in pg_stat_archiver
sudo -u postgres psql -c "
SELECT archived_count, last_archived_wal, last_archived_time,
       failed_count, last_failed_wal, last_failed_time
FROM pg_stat_archiver;
"

A non-zero failed_count means your archive_command is silently failing. Fix it immediately. A broken archive is functionally the same as no archive.

Monitor Archive Lag

On busy databases, WAL can be generated faster than it’s archived. Keep an eye on the gap:

sudo -u postgres psql -c "
SELECT
  pg_current_wal_lsn() AS current_lsn,
  last_archived_wal,
  now() - last_archived_time AS archive_lag
FROM pg_stat_archiver;
"

If archive_lag keeps growing, your storage or network can’t keep up. Switch to a faster archive target before it becomes a problem.

Test Recovery Regularly

Most teams skip this. That’s exactly when it bites them. An untested backup is not a backup.

Run a quarterly drill: restore your base backup to a spare server, replay WAL archives, and verify row counts against production snapshots. I keep a small script for this:

#!/bin/bash
# Quick PITR drill — restore to 1 hour ago on a test instance
TARGET_TIME=$(date -d '1 hour ago' '+%Y-%m-%d %H:%M:%S%z')
echo "[PITR DRILL] Target recovery time: $TARGET_TIME"

# ... (restore base backup, configure restore_command, set recovery_target_time)
# Then verify row counts and spot-check critical tables
sudo -u postgres psql -p 5433 mydb -c "SELECT COUNT(*) FROM users;"

Retention Policy

WAL archives pile up fast. A high-write database can generate several gigabytes of WAL per day. Add a cleanup job to drop archives older than your retention window:

# Keep 14 days of WAL archives
find /var/lib/postgresql/wal_archive -type f -mtime +14 -delete

Better yet, use pg_archivecleanup — it only removes segments that are no longer needed relative to a specific backup, so you won’t accidentally delete something still required:

# Remove WAL older than a specific backup's start point
sudo -u postgres pg_archivecleanup /var/lib/postgresql/wal_archive 000000010000000000000020

Key Lessons From Production

  • Set archive_timeout — without it, a quiet database may go hours without archiving a segment, leaving a large gap at the tail of your recovery window.
  • Archive to off-site storage — if data files and WAL archives share the same disk, one hardware failure wipes both. Minimum viable setup: S3 or a separate server.
  • Alert on pg_stat_archiver.failed_count — wire it into PagerDuty, Grafana, whatever you use. Silent archive failures go unnoticed until 3am when you need those files.
  • Write down your recovery procedure — when an incident hits at 3am, you don’t want to Google the exact syntax for recovery_target_time under pressure. One page, step by step, kept somewhere everyone can reach.
  • Drill before you need it — recovery drills catch configuration drift and storage problems before they turn into incidents.
Share: