Scaling Database Performance: A Practical Guide to Read Replicas in PostgreSQL and MySQL

Table of Contents

The 2:15 AM Wake-Up Call: When Your Database Hits the Wall

My PagerDuty went off at 2:15 AM on a Tuesday. I opened my laptop to find our primary PostgreSQL instance—an AWS db.m5.2xlarge—pegged at 99% CPU. Application response times had ballooned from a crisp 150ms to a staggering 20 seconds. We weren’t just slow; we were effectively offline.

A marketing email had just reached 50,000 customers. Thousands of users were hitting their dashboards simultaneously, triggering complex SELECT queries for order histories and activity logs. Simultaneously, new signups were flooding the same database with INSERT operations. The single node couldn’t keep up with the combined pressure of heavy writes and massive read volume.

Most relational databases share a common breaking point. A single node has finite CPU cycles and I/O throughput. That night, it became clear that vertical scaling was no longer enough. We needed to decouple our traffic using a Read Replica strategy before the next morning peak hit.

The Resource Contention Trap

In a basic single-instance setup, every query competes for the same memory and disk I/O. Modern engines like PostgreSQL and MySQL are efficient, but they still face the fundamental conflict of write locks versus read performance.

Long-running analytical queries often hold shared locks or consume massive amounts of RAM for sorting. When a high-frequency write operation arrives, it waits in the queue, causing a bottleneck that ripples through the entire application. You can upgrade to a larger server, but you will eventually hit a point where doubling your monthly cloud bill from $800 to $1,600 only yields a 10% performance gain. This is the dead end of vertical scaling.

Choosing Your Scaling Path

We evaluated three primary ways to handle the load:

Vertical Scaling: Increasing RAM and CPU. This is the easiest fix but carries a high price tag and requires downtime during the resize.
Sharding: Distributing data across multiple primary nodes. While powerful, it introduces extreme architectural complexity that most teams don’t need until they reach petabyte-scale data.
Read Replicas: Creating read-only copies that stay in sync with the primary. This offloads roughly 80% of typical web traffic from the main node and provides a failover target for high availability.

Implementing Read Replicas in PostgreSQL

PostgreSQL utilizes Streaming Replication. The primary server streams Write-Ahead Log (WAL) records to the replica, which then applies them in real-time. Here is the configuration we used to stabilize our environment.

1. Configure the Primary Node

First, modify postgresql.conf to allow replication connections and define data retention:

# Enable replication features
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on

Next, update pg_hba.conf to permit the replica’s specific IP address to connect:

# Allow the replica IP (e.g., 10.0.0.5) to connect
host replication replication_user 10.0.0.5/32 md5

2. Provision the Replica

On the secondary server, use the pg_basebackup tool to clone the data from the primary. This command creates a baseline for the replica to start following the primary’s log stream:

bash
pg_basebackup -h primary-ip -D /var/lib/postgresql/15/main -U replication_user -P -R

The -R flag is the most important part. It generates the standby.signal file, telling Postgres to start in read-only mode and follow the primary immediately.

Implementing Read Replicas in MySQL

MySQL generally uses the Binary Log (binlog) for replication. The primary logs every change, and the replica pulls these events to replay them locally.

1. Primary Setup

Add these lines to your my.cnf or mysqld.cnf file:

[mysqld]
server-id = 1
log-bin = mysql-bin
# Every node must have a unique server-id

2. Replica Setup

Assign a unique ID and enable the relay log on the replica server:

[mysqld]
server-id = 2
relay-log = replica-relay-bin
read_only = 1

Finally, link the replica to the primary via SQL:

sql
CHANGE MASTER TO
  MASTER_HOST='primary-ip',
  MASTER_USER='replica_user',
  MASTER_PASSWORD='secure_password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=154;
START SLAVE;

Infrastructure is Just the First Step

Spinning up a replica doesn’t automatically fix performance. If you don’t update your application code, your primary database will still hover at 100% CPU while the replica sits idle. You must explicitly route your queries.

The most reliable approach involves defining two distinct database connections in your backend: a Writer for POST/PUT/DELETE requests and a Reader for GET requests.

Python Logic Example (SQLAlchemy)

Instead of using one global session, implement a router to pick the engine based on the operation:

python
class DatabaseRouter:
    def __init__(self, primary, replica):
        self.primary = primary
        self.replica = replica

    def get_engine(self, is_write=False):
        return self.primary if is_write else self.replica

# Usage
# Creating a record
engine = db_router.get_engine(is_write=True)
# Fetching data
engine = db_router.get_engine(is_write=False)

The Reality of Replication Lag

Replication is usually asynchronous. This means there is a slight delay—often between 10ms and 500ms—between a write on the primary and that data appearing on the replica. We call this Replication Lag.

If a user updates their settings and you immediately redirect them to a profile page that reads from the replica, they might see their old data. This results in confusing support tickets. To prevent this, we route “Read-Your-Own-Writes” queries to the primary for a 15-second window following any update operation.

The Results

Transitioning to a Read Replica architecture changed everything for our infrastructure. By moving heavy SELECT traffic to two smaller replicas, our primary database CPU dropped from 99% to a stable 15%. Query latency for users fell from seconds to under 100ms.

If your database is struggling, look at your read-to-write ratio. Most web applications are 80-90% reads. Offloading that volume is the most sustainable way to grow. You will save money on hardware and, more importantly, you will finally be able to sleep through the night.