The 2 AM Nightmare: When Transactions Collide
2:15 AM. My phone is screaming on the nightstand. PagerDuty alerts are flooding in, application latency is vertical, and the database connection pool is officially tapped out. I pull up the logs and find the culprit: ERROR: deadlock detected. Whether you are running MySQL, PostgreSQL, or even MongoDB, deadlocks are the ultimate equalizer. They don’t care about your uptime; they only care about circular dependencies.
A deadlock happens when two transactions hold locks the other needs. Neither can move forward, creating a permanent stall. While modern engines break these cycles automatically, the fallout is real. Every killed transaction means a failed request, a frustrated user, and a potential spike in error rates that can trip your circuit breakers.
How Engines Fight Back: Different Playbooks
Both databases handle deadlocks, but their internal logic is surprisingly different. Understanding these mechanics helps you tune for performance rather than just survival.
PostgreSQL: The Wait-for Graph
Postgres is cautious. It doesn’t scan for deadlocks constantly because that would waste precious CPU cycles on healthy transactions. Instead, it waits for a timer—defined by deadlock_timeout (usually 1 second). Once a transaction sits idle for that full second, Postgres starts hunting. It builds a “wait-for graph” to map out who is blocking whom. If it spots a loop, it kills one transaction—typically the one that just woke up the detector—allowing the rest of the queue to clear.
MySQL (InnoDB): The Aggressive Hunter
InnoDB takes a more proactive stance. It tracks lock requests in real-time and can often kill a deadlock cycle the microsecond it forms. By default, innodb_deadlock_detect is active, and the engine is ruthless about picking a victim. It usually rolls back the transaction that has done the least amount of “work” (measured by the size of its undo logs) to minimize the cost of the rollback.
The Hidden Costs of Automatic Handling
Letting the database “fix” deadlocks for you is a risky strategy. Here is why you should minimize them rather than just relying on the engine.
The Upside
- System Survival: Without detection, a single deadlock could hang connections indefinitely. This eventually exhausts your 500+ connection pool and crashes the entire service.
- Self-Healing: The database ensures that at least one side of the conflict survives to finish its job.
The Downside
- CPU Tax: When you are pushing 5,000+ transactions per second, constant cycle-checking can consume 10-15% of your total CPU overhead.
- User Impact: One transaction must die. If your backend doesn’t catch that specific error code, the user gets a generic 500 error and a broken checkout flow.
- The Domino Effect: Even a 1-second delay in PostgreSQL can cause requests to pile up, triggering a cascade of timeouts across your microservices.
Hardening Your Production Setup
Visibility is your best weapon. You cannot optimize what you aren’t logging.
1. PostgreSQL Strategy
Keep deadlock_timeout at 1s. Dropping it to 100ms often creates more noise than signal, as many locks resolve themselves naturally in a few milliseconds. The real win comes from enabling detailed logs:
# postgresql.conf
deadlock_timeout = 1s
log_lock_waits = on
log_recovery_conflict_waits = on
2. MySQL Strategy
Ensure the detector is on, but set a strict safety net with innodb_lock_wait_timeout. The default is often 50 seconds, which is far too long for a modern web app. Aim for 5-10 seconds to kill “zombie” transactions that aren’t deadlocked but are just slow.
# my.cnf
innodb_deadlock_detect = ON
innodb_lock_wait_timeout = 10
log_error_verbosity = 3
Execution: Prevention and Resilience
A great database engineer writes code that avoids locks and code that survives them when they occur.
Strategy A: Enforce Consistent Lock Ordering
Most deadlocks stem from updating rows in different sequences. If User A updates Account 1 then 2, and User B updates Account 2 then 1, they will eventually collide. The fix is simple: Always sort your IDs before locking.
# Sort IDs to ensure every thread locks in the same sequence
ids = sorted([account_src_id, account_dst_id])
with db.transaction():
# Lock Account 1, then Account 2 consistently
db.execute("SELECT * FROM accounts WHERE id = %s FOR UPDATE", ids[0])
db.execute("SELECT * FROM accounts WHERE id = %s FOR UPDATE", ids[1])
Strategy B: The Smart Retry Loop
In high-traffic systems, deadlocks are sometimes an unavoidable statistical reality. Your application must be idempotent. I use exponential backoff to prevent the “thundering herd” effect where multiple retries just cause more deadlocks.
import time
import psycopg2
from psycopg2 import errorcodes
def execute_with_retry(query, params, max_retries=3):
for attempt in range(max_retries):
try:
with connection.cursor() as cursor:
cursor.execute(query, params)
connection.commit()
return
except psycopg2.extensions.TransactionRollbackError as e:
if e.pgcode == errorcodes.DEADLOCK_DETECTED:
connection.rollback()
# Wait 100ms, 200ms, then 400ms
time.sleep(0.1 * (2 ** attempt))
continue
raise
raise Exception("Deadlock persists after 3 attempts")
Strategy C: Diagnostic Deep-Dives
If you are actively being paged, don’t guess. Use these queries to find the exact statements causing the friction.
PostgreSQL Blocking Query:
SELECT
blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
MySQL Status Check:
-- This provides the full history of the latest deadlock
SHOW ENGINE INNODB STATUS;
-- Identify which transactions are currently stalled
SELECT * FROM information_schema.innodb_trx;
SELECT * FROM sys.innodb_lock_waits;
Deadlocks aren’t a sign of failure; they are a side effect of scale and concurrency. By enforcing strict lock ordering and building resilient retry logic, you can turn those 2 AM crises into minor log entries that resolve before you even finish your first cup of coffee.

