The 2 AM PagerDuty Wake-up Call
It was 2:14 AM when my phone started screaming. PagerDuty reported that our primary production database CPU had pinned at 100%. It stayed there for ten straight minutes. Web requests were timing out. Our checkout flow was dead. I scrambled to my laptop, bleary-eyed, and ran a quick SHOW PROCESSLIST; on our MySQL instance.
I found the culprit instantly. A marketing analyst was running a “quick” query. It calculated average order values per region, grouped by month, across five years of history. That single command triggered a full table scan on 50 million rows. It locked critical tables and starved the application’s simple INSERT and UPDATE operations.
This is a classic mistake. We were treating an OLTP system like an OLAP system. I’ve worked with MySQL, PostgreSQL, and MongoDB on dozens of projects. Each has unique strengths. However, asking one database to handle high-frequency transactions and heavy analytical processing is a guaranteed recipe for disaster.
The Architectural Mismatch
The problem isn’t just a “slow query.” It is a fundamental mismatch in how data is stored. Traditional databases like PostgreSQL or MySQL are built for Online Transactional Processing (OLTP). They excel at thousands of tiny, lightning-fast operations. Think of inserting a new user, updating a password, or fetching one specific order by its ID.
Analytical queries—like calculating total revenue for the last decade—require Online Analytical Processing (OLAP). These queries touch millions of records to aggregate data. OLTP databases store data in rows. To sum up a single price column, the engine must read every single column of every row from the disk. This wastes I/O, hogs memory, and chokes the CPU.
OLTP: The Backbone of Your User Experience
OLTP handles the “here and now.” It powers your live application. When a customer clicks “Buy,” they trigger an OLTP operation. The priority here is concurrency and integrity. You need the database to ensure two people don’t buy the last item in stock. This is where ACID compliance becomes vital.
Key Metrics for OLTP:
- Query Type: High-volume
INSERT,UPDATE, andDELETE. - Latency: Should be under 20–50 milliseconds.
- Data Volume: Usually just a few kilobytes per transaction.
- Storage: Row-oriented (optimized for finding specific records).
Here is a typical OLTP transaction in a PostgreSQL environment:
-- Processing a user purchase
BEGIN;
UPDATE accounts SET balance = balance - 89.99 WHERE user_id = 5501;
INSERT INTO order_history (user_id, amount, status) VALUES (5501, 89.99, 'completed');
COMMIT;
This is efficient. The database uses an index on user_id to find one specific row, locks it, updates it, and moves on.
OLAP: The Decision-Making Factory
OLAP is about the “big picture.” Data scientists and business analysts use it to spot trends. You don’t use OLAP to run your website. Instead, you use it to decide which inventory to stock next quarter. These systems usually live in Data Warehouses.
Key Metrics for OLAP:
- Query Type: Complex
SELECTstatements with heavyJOINs andGROUP BYs. - Throughput: Can scan terabytes of data in seconds.
- Latency: Ranges from seconds to minutes depending on complexity.
- Storage: Column-oriented (optimized for aggregations).
An OLAP query looks like this:
-- Analyzing year-over-year growth by product category
SELECT
EXTRACT(YEAR FROM order_date) as year,
category_name,
SUM(total_price) as revenue
FROM sales_data_warehouse
WHERE order_date > '2018-01-01'
GROUP BY 1, 2
ORDER BY 1 DESC;
Row-Based vs. Columnar Storage
To understand why systems like BigQuery, Snowflake, or ClickHouse are so much faster for big data, look at the disk storage. In a Row-oriented database, data is packed together by record:
Row 1: [ID, Name, Email, Address, Balance]
Row 2: [ID, Name, Email, Address, Balance]
If you only need to sum the Balance, the database still reads the Name, Email, and Address. They are physically stuck together on the disk. This is a massive waste of hardware resources.
In a Column-oriented database, data is stored by attribute:
Column ID: [1, 2, ...]
Column Name: [Alice, Bob, ...]
Column Balance: [100.00, 50.00, ...]
To calculate the total Balance, the engine only reads the Balance block. It ignores everything else. This allows an OLAP system to scan a billion rows in the time it takes an OLTP system to scan a million.
The Solution: Building a Data Pipeline
Never run heavy analytics on your production database. You need a pipeline to move data from your OLTP “Source of Truth” to an OLAP “Analytical Hub.” This is the ETL (Extract, Transform, Load) process.
- Extract: Pull data from your MySQL or PostgreSQL production replicas.
- Transform: Clean the data and strip out sensitive PII (Personally Identifiable Information).
- Load: Push the refined data into a warehouse like BigQuery or Snowflake.
For a modern approach, many teams use Change Data Capture (CDC). This streams every database change to the warehouse in near real-time. Here is a simplified conceptual example using Python and SQLAlchemy:
import pandas as pd
import sqlalchemy
# 1. Extract from production (PostgreSQL)
prod_engine = sqlalchemy.create_engine('postgresql://user:pass@prod-db/app')
df = pd.read_sql("SELECT * FROM orders WHERE status = 'shipped'", prod_engine)
# 2. Transform: Calculate tax-exclusive price
df['net_amount'] = df['total_price'] / 1.1
# 3. Load to Analytics Warehouse (Snowflake/BigQuery)
warehouse_engine = sqlalchemy.create_engine('postgresql://user:pass@analytics-db/dw')
df.to_sql('fact_sales', warehouse_engine, if_exists='append')
Decision Matrix: Which One Do You Need?
Startups often over-engineer this too early. If your dataset is under 100,000 rows, a single PostgreSQL instance is plenty. But once you hit the “Production is slow” wall, use this guide:
Stick with OLTP (PostgreSQL, MySQL) if:
- You need to handle real-time user clicks and logins.
- Data integrity is your top priority.
- Queries usually target single records via an ID.
- Your workload is write-heavy (lots of updates and inserts).
Move to OLAP (ClickHouse, BigQuery, Snowflake) if:
- You are building dashboards or internal reports.
- You need to aggregate across millions or billions of rows.
- Data is mostly “append-only” (like event logs or sensor data).
- Complex joins are slowing down your main application.
What I’ve Learned After the Fires
The 2 AM incident I described earlier had a quick fix. We killed the rogue query and set up a read-only replica for the marketing team. However, that was just a temporary bandage. The real solution was moving our five-year history into a dedicated OLAP environment.
Don’t wait for your production database to catch fire. If your analytical queries take longer than five seconds, it is time to stop using your transactional engine as a warehouse. Separate your concerns. Keep your OLTP for the users and your OLAP for the insights.

