The Problem: Querying Data That References Itself
At some point, every backend developer runs into hierarchical data. An employee table where each row has a manager_id pointing to another employee. A category table where each category has a parent_id. A comment thread where replies nest under replies. A folder tree. A bill of materials.
The classic mistake — and I’ve made it myself — is trying to handle this in application code. Fetch the root, fetch its children, fetch their children, repeat until done. It works. But it hammers the database with N+1 queries and turns into a maintenance nightmare the moment the hierarchy grows past three levels.
That’s where a recursive Common Table Expression (CTE) comes in. The pattern takes maybe an hour to get comfortable with. After that, you’ll reach for it every time.
Core Concepts: How Recursive CTEs Actually Work
A regular CTE is a named subquery — a way to write cleaner SQL by labeling a subquery so you can reference it later. A recursive CTE goes further: it lets the query reference itself, expanding one level at a time until there’s nothing left to expand.
Every recursive CTE has three parts:
- Anchor member — the base query that returns starting rows (root nodes)
- Recursive member — a query that joins back against the CTE itself, adding one level per iteration
- Termination condition — implicitly, when the recursive member returns zero rows, the loop stops
The syntax looks like this:
WITH RECURSIVE cte_name AS (
-- Anchor: starting point
SELECT ...
UNION ALL
-- Recursive: join CTE against source table
SELECT ... FROM source_table JOIN cte_name ON ...
)
SELECT * FROM cte_name;
RECURSIVE is required in PostgreSQL. MySQL 8.0+ uses the same syntax — recursive CTE support landed in MySQL 8.0, so anything older won’t run this at all.
One Thing to Watch Out For
Dirty data can have cycles: A is parent of B, B is parent of A. A recursive CTE will loop forever on that. PostgreSQL lets you add a depth counter column and cap it explicitly. MySQL has cte_max_recursion_depth (default: 1000 iterations) as a system-level safety net.
Hands-On Practice
Setting Up a Sample Hierarchy
Classic starting point: an employee table that references itself via manager_id. This schema works identically in PostgreSQL and MySQL 8.0+.
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
role VARCHAR(100),
manager_id INT REFERENCES employees(id)
);
INSERT INTO employees VALUES
(1, 'Alice', 'CEO', NULL),
(2, 'Bob', 'VP Eng', 1),
(3, 'Carol', 'VP Sales', 1),
(4, 'Dave', 'Lead Dev', 2),
(5, 'Eve', 'Developer', 4),
(6, 'Frank', 'Developer', 4),
(7, 'Grace', 'Sales Rep', 3);
Query 1: Full Org Chart, Top to Bottom
This query walks the entire tree starting from the CEO and returns every employee with their depth level:
WITH RECURSIVE org_chart AS (
-- Anchor: root node (no manager)
SELECT
id,
name,
role,
manager_id,
0 AS depth,
name::TEXT AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: join each employee to their manager
SELECT
e.id,
e.name,
e.role,
e.manager_id,
oc.depth + 1,
oc.path || ' > ' || e.name
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT
repeat(' ', depth) || name AS indented_name,
role,
depth,
path
FROM org_chart
ORDER BY path;
The path column builds a breadcrumb string like Alice > Bob > Dave > Eve. Useful both for debugging and for display — paste it straight into a UI and you’re done.
Query 2: Find All Subordinates of a Specific Person
Say you need everyone who reports to Bob — directly or indirectly — to apply a permission change or calculate team headcount.
WITH RECURSIVE subordinates AS (
-- Anchor: start at Bob (id = 2)
SELECT id, name, role, manager_id
FROM employees
WHERE id = 2
UNION ALL
SELECT e.id, e.name, e.role, e.manager_id
FROM employees e
JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates;
Result: Bob, Dave, Eve, Frank. Four rows. No application-level looping, no N+1.
Query 3: Walk Up the Tree (Find the Chain of Command)
Sometimes you need the opposite direction. Given Eve, find her entire management chain up to the CEO — useful for authorization checks or audit trails.
WITH RECURSIVE chain_of_command AS (
-- Anchor: start at Eve (id = 5)
SELECT id, name, role, manager_id
FROM employees
WHERE id = 5
UNION ALL
SELECT e.id, e.name, e.role, e.manager_id
FROM employees e
JOIN chain_of_command c ON e.id = c.manager_id
)
SELECT * FROM chain_of_command;
Returns Eve → Dave → Bob → Alice. Notice the recursive join is flipped: instead of matching children (e.manager_id = oc.id), we match parents (e.id = c.manager_id). That single reversal is the whole trick.
Practical Tip: Preventing Infinite Loops
Add a depth counter and cap it explicitly if your data might have cycles:
WITH RECURSIVE safe_tree AS (
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, st.depth + 1
FROM employees e
JOIN safe_tree st ON e.manager_id = st.id
WHERE st.depth < 10 -- Safety cap: stop at depth 10
)
SELECT * FROM safe_tree;
Real-World Pattern: Category Trees
E-commerce category trees hit this problem constantly. A categories table with parent_id is structurally identical to the employee example. Fetching the full path for SEO breadcrumbs takes about two minutes with a recursive CTE:
WITH RECURSIVE category_path AS (
SELECT id, name, parent_id, name::TEXT AS full_path
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id,
cp.full_path || ' / ' || c.name
FROM categories c
JOIN category_path cp ON c.parent_id = cp.id
)
SELECT id, full_path FROM category_path ORDER BY full_path;
A Note on Data Preparation
When I’m prototyping these queries with sample data from CSV exports — say, a category export from an e-commerce platform — I often need to convert that CSV into SQL INSERT statements or structured JSON before loading it into a test database. I use toolcraft.app/en/tools/data/csv-to-json for the CSV-to-JSON step. It runs entirely in the browser, so no data leaves your machine — handy when the export contains customer or product data you’d rather not send to a third-party service.
Performance Tips
Recursive CTEs can get expensive on large tables. A few things that help in practice:
- Index the foreign key column. In this example,
manager_idneeds an index. PostgreSQL doesn’t automatically index foreign keys — you add it manually withCREATE INDEX. On a table with 100k+ rows, this alone can drop query time from seconds to milliseconds. - Select only what you need inside the CTE. The recursive member runs once per level. Pulling in wide text columns you don’t use means re-reading that data on every iteration. Select the minimum inside the CTE; join extra columns in the outer query.
- Use
UNION ALL, notUNION.UNIONdeduplicates rows, which adds a sort step. Unless you specifically need deduplication, always useUNION ALLinside a recursive CTE. - Check
EXPLAIN ANALYZE. PostgreSQL’s planner doesn’t always optimize recursive CTEs as aggressively as regular queries. RunEXPLAIN ANALYZEon the full statement — it shows exactly where time is going.
When to Use This — and When Not To
Recursive CTEs fit well when:
- The hierarchy depth is unknown or varies by row
- You need the full tree or a large portion of it
- You want a single database round-trip instead of N+1 queries
Fixed-depth hierarchies — always exactly two or three levels by design — are simpler with regular self-joins. Recursive CTEs shine the moment depth becomes dynamic.
One alternative worth knowing: PostgreSQL’s ltree extension stores materialized paths as a native data type with fast, specialized operators. On very deep or very frequently queried trees, ltree can significantly outperform recursive CTEs. The tradeoff is that your INSERT and UPDATE logic gets more complex. Recursive CTEs need no schema changes, which makes them the right starting point for almost every project — reach for ltree only once you have a measured performance problem.
The Pattern, Summarized
Anchor member, recursive member, termination. That’s it. The pattern feels strange the first time you write it. By the third query, it reads naturally.
Four habits to build now: index the parent/foreign key column, use UNION ALL not UNION, add a depth cap on data you don’t fully control, and run EXPLAIN ANALYZE whenever performance matters. With those in place, you’ll handle most hierarchical data problems in a single SQL statement — no application-level looping required.

