PostgreSQL Encryption at Rest: Protecting Sensitive Data with pgcrypto and Column-level Encryption

It was 2:47 AM when I got the Slack message. A misconfigured security group had left a PostgreSQL server exposed to the internet for roughly four hours. The server contained a users table with emails, phone numbers, and government ID numbers stored in plain text. No breach was confirmed — but it didn’t matter. The data was there, unprotected, and if anyone had pulled a backup during that window, all they needed was a pg_restore to read everything.

That incident changed how I approach database security. I’ve run MySQL, PostgreSQL, and MongoDB across different projects — each takes a different approach to data-at-rest protection. PostgreSQL gives you the most granular set of options. The trick is choosing the right one for your threat model, not just enabling whatever checkbox the cloud console offers.

Table of Contents

The Four Encryption Approaches

Before writing any SQL, you need to understand what you’re actually protecting against. Each encryption layer defends against a different attack vector.

1. Full-Disk Encryption (FDE)

LUKS on Linux, BitLocker on Windows. The entire disk partition is encrypted at the block device level. PostgreSQL and the OS see nothing — the kernel handles decryption transparently after boot. Your cloud provider likely has this enabled by default: AWS RDS uses AES-256 at the storage layer if you check the box. Same with Google Cloud SQL and Azure Database for PostgreSQL.

2. Filesystem-Level Encryption

Tools like eCryptfs or fscrypt encrypt at the filesystem layer rather than the block device. You point them at PostgreSQL’s data directory (/var/lib/postgresql/) specifically. Slightly more granular than FDE, but operationally similar.

3. Transparent Data Encryption (TDE)

Enterprise PostgreSQL distributions — EDB Postgres Advanced Server, Percona Distribution for PostgreSQL — offer TDE. It encrypts data files and WAL at the cluster level without changing your SQL. Community PostgreSQL doesn’t ship TDE natively as of version 16, though patches exist. This is a paid-distribution feature.

4. Column-Level Encryption with pgcrypto

This is the layer that actually matters when someone has database credentials. pgcrypto is a PostgreSQL extension that provides cryptographic functions directly in SQL. You encrypt specific columns containing sensitive data. The database stores ciphertext; only the application holding the key can read the original values.

Pros, Cons, and the Real Trade-offs

Full-Disk Encryption
Protects against physical disk theft and stolen cloud snapshots. Zero code changes, trivial to enable on managed services. But it does absolutely nothing if someone gets database-level access — a valid session reads plaintext as normal. If your PostgreSQL credentials leak, FDE offers no protection. It’s defending against “someone stole the hard drive from the data center,” not “someone got a database password.”

Filesystem Encryption
Same limitations as FDE once the database is running. Adds noticeable I/O overhead on high-throughput workloads. The protection window is narrow: offline backups and stolen disks. Not worth the complexity for most teams when FDE is already available.

TDE
Transparent, no application changes, protects data files and WAL backups at rest. The downside: requires a paid distribution or a patched custom build. Adds key management complexity at the server level. Not realistic for teams running community PostgreSQL on a budget.

Column-Level Encryption (pgcrypto)
This is the one that actually stops the 2 AM scenario. A DBA with full table access sees only ciphertext for encrypted columns. A stolen pg_dump backup is useless without the application key. You encrypt only the fields that matter — not the entire table — so the performance hit stays bounded. The cost: application changes, no standard B-tree indexing on encrypted columns, and key management becomes entirely your problem.

The Recommended Setup

For production applications handling PII — Social Security Numbers, national IDs, financial account numbers, medical records — the answer isn’t choosing one approach. It’s layering:

Enable FDE or storage-level encryption at the infrastructure layer. On managed services this is a checkbox during cluster creation — it costs nothing and should always be on.
Use column-level encryption for the specific fields that carry legal liability if exposed.
Store application encryption keys completely separate from the database — never in the same PostgreSQL instance.

Most teams skip column-level encryption. It’s the one they regret. Here’s the implementation.

Implementation Guide: pgcrypto Column-Level Encryption

Step 1: Enable the Extension

-- Run as superuser
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Verify installation
SELECT name, default_version FROM pg_available_extensions WHERE name = 'pgcrypto';

Step 2: Design Your Schema

Encrypted columns must use bytea type, not TEXT. The ciphertext is binary data, and storing it as text causes encoding issues.

CREATE TABLE users (
    id          SERIAL PRIMARY KEY,
    email       TEXT NOT NULL UNIQUE,  -- searchable, not sensitive
    full_name   TEXT,
    ssn_enc     BYTEA,                  -- Social Security Number (encrypted)
    tax_id_enc  BYTEA,                  -- Tax ID (encrypted)
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

Step 3: Encrypting Data on Insert

Use pgp_sym_encrypt() for symmetric encryption. The key comes from your application configuration — never hardcoded in SQL files or migration scripts.

-- Pass the encryption key via a session-level parameter
SET app.encryption_key = 'your-32-byte-random-key-here';

-- Insert with encrypted SSN
INSERT INTO users (email, full_name, ssn_enc)
VALUES (
    '[email protected]',
    'Alice Johnson',
    pgp_sym_encrypt('123-45-6789', current_setting('app.encryption_key'))
);

Step 4: Reading Encrypted Data

-- Decrypt in SELECT — only works with the correct key
SELECT
    email,
    full_name,
    pgp_sym_decrypt(ssn_enc, current_setting('app.encryption_key')) AS ssn
FROM users
WHERE id = 1;

-- Attempting decryption with wrong key throws an error (not silent failure)
SELECT pgp_sym_decrypt(ssn_enc, 'wrong-key') FROM users WHERE id = 1;
-- ERROR:  Wrong key or corrupt data

Step 5: Python Application Integration

import psycopg2
import os

ENCRYPTION_KEY = os.environ.get('DB_ENCRYPTION_KEY')  # From env or vault

conn = psycopg2.connect(os.environ['DATABASE_URL'])
cur = conn.cursor()

# Set key for this session before any encrypted operations
cur.execute("SET app.encryption_key = %s", (ENCRYPTION_KEY,))

# Insert encrypted data
cur.execute("""
    INSERT INTO users (email, full_name, ssn_enc)
    VALUES (%s, %s, pgp_sym_encrypt(%s, current_setting('app.encryption_key')))
""", ('[email protected]', 'Bob Smith', '987-65-4321'))

# Read decrypted data
cur.execute("""
    SELECT email, pgp_sym_decrypt(ssn_enc, current_setting('app.encryption_key'))
    FROM users WHERE email = %s
""", ('[email protected]',))

row = cur.fetchone()
print(f"Email: {row[0]}, SSN: {row[1]}")

conn.commit()
cur.close()
conn.close()

Step 6: Generating a Strong Encryption Key

The pgcrypto encryption is only as strong as the key. For a 256-bit AES-compatible key:

# Generate a cryptographically secure 32-byte key
openssl rand -base64 32
# Output example: tG8k2mP9xLqR4nY7vW1sA3bH6jE0cF5d=

# Or using Python
python3 -c "import secrets; print(secrets.token_urlsafe(32))"

When working with encryption keys locally during setup, I use the Hash Generator at toolcraft.app to quickly verify SHA-256 checksums of key files and configuration exports. The reason I reach for it specifically: it runs entirely client-side in the browser — key material never touches a remote server. That detail matters when you’re handling cryptographic secrets during a late-night setup.

Step 7: Verify the Setup End-to-End

-- Confirm the column stores ciphertext, not readable text
SELECT ssn_enc FROM users WHERE email = '[email protected]';
-- Returns binary data: \xcb6a3d9f2b...

-- Confirm correct key decrypts successfully
SELECT pgp_sym_decrypt(ssn_enc, current_setting('app.encryption_key'))
FROM users WHERE email = '[email protected]';
-- Returns: 123-45-6789

Key Management: Where Most Teams Get This Wrong

The SQL above works. Key management is where production systems quietly fail.

Never store the encryption key in PostgreSQL itself. If it lives in a table, a stored procedure, or a postgresql.conf parameter that gets included in backups, you’ve encrypted the data and left the key in the same box.
Use a secrets manager. AWS Secrets Manager, HashiCorp Vault, or GCP Secret Manager. Even a properly structured .env file excluded from git is better than hardcoding.
Rotate keys on a schedule. Build a re-encryption script that decrypts rows with the old key and re-encrypts with the new one. It’s operational overhead, but PCI-DSS and HIPAA both require key rotation procedures.
Use separate keys per data classification. Different key for SSNs versus financial account numbers. One compromised key doesn’t expose everything.

For generating strong initial keys and passwords during setup, toolcraft.app’s password generator lets you produce cryptographically random strings of any length with full control over character sets. Same deal — it’s client-side only, so nothing you generate is logged anywhere.

Performance Considerations

Column-level encryption has real costs worth knowing upfront:

No B-tree indexing on encrypted columns. You cannot do WHERE ssn_enc = pgp_sym_encrypt('123-45-6789', key) efficiently. The workaround is storing an HMAC of the value in a separate indexed column for exact-match lookups.
CPU overhead on encrypt/decrypt — AES-NI handles individual operations in microseconds, but decrypting tens of thousands of rows during a full-table scan adds measurable latency. Keep bulk analytical queries off encrypted columns where possible.
Storage increase — pgcrypto’s OpenPGP format adds roughly 50-100 bytes of header and salt overhead per value, on top of AES block padding that rounds to 16-byte boundaries. An 11-character SSN encrypts to around 110-130 bytes on disk.

For most applications protecting a handful of sensitive fields, the math is simple. Encrypt the five columns that would end your company if leaked. Leave the rest as normal indexed columns.

The 2 AM Takeaway

Back to that incident. The server already had FDE enabled — didn’t matter, because the hypothetical attacker had database credentials, not physical disk access. Column-level encryption on the government ID columns would have meant the worst-case scenario was ciphertext exposure, not plaintext. A potential breach becomes a defensible incident report instead of a regulatory filing.

Start with three columns: whatever field stores your most sensitive identifier, whatever field stores financial data, and whatever field your compliance team would wake you up about at 2 AM. Enable pgcrypto, migrate those columns to bytea, update the application layer to encrypt on write and decrypt on read. That’s a weekend of work for protection that FDE alone will never give you.