PostgreSQL as a Graph Database: 6 Months with Apache AGE

Table of Contents

Six Months with Apache AGE: Why We Ditched Pure Relational Queries

Six months ago, our logistics platform reached a breaking point. We were building a recommendation engine to track multi-layered dependencies between suppliers, warehouses, and shipping routes. While PostgreSQL is our go-to for structured data, it struggled with deep relationship mapping. Our recursive CTEs became 200-line monsters that took 5 seconds to execute and even longer to debug.

We looked at Neo4j, but the thought of managing another database cluster felt like a maintenance trap. Then I came across Apache AGE (A Graph Extension). It embeds graph functionality directly into PostgreSQL using openCypher. After running it in production for half a year, I’ve seen it transform our data architecture from a tangled mess into a streamlined system.

The 5-Minute Setup

Docker is the path of least resistance for testing AGE. Compiling from source on an existing instance is possible, but version mismatches often cause headaches during the initial build.

docker run --name age-postgres -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d apache/age

Once your container is live, connect via psql and run the initialization script. These commands prepare the environment and set the necessary search paths.

-- Load the AGE extension
CREATE EXTENSION age;

-- Set the path to include the age namespace
LOAD 'age';
SET search_path = ag_catalog, "$user", public;

In AGE, data lives in “graphs” which function like isolated namespaces. Let’s build a small network to see it in action.

SELECT create_graph('social_network');

-- Create a Person node
SELECT * FROM cypher('social_network', $$
  CREATE (n:Person {name: 'Alice', age: 30})
$$) as (v agtype);

-- Create another Person and a relationship
SELECT * FROM cypher('social_network', $$
  MATCH (a:Person {name: 'Alice'})
  CREATE (b:Person {name: 'Bob', age: 25}),
         (a)-[:FOLLOWS]->(b)
$$) as (v agtype);

How Apache AGE Bridges the Gap

What makes AGE special is its storage engine. It doesn’t just mimic a graph; it maps graph labels to Postgres tables and edges to optimized structures. When you fire off an openCypher query, AGE translates it into a native Postgres execution plan.

The openCypher Advantage

If you’ve touched Neo4j, you already know the syntax. It is built for relationship-heavy data. Compare the readability of finding “friends of friends” below.

Standard SQL (The hard way):

WITH RECURSIVE friends AS (
    SELECT friend_id FROM user_relations WHERE user_id = 1
    UNION
    SELECT r.friend_id FROM user_relations r
    INNER JOIN friends f ON f.friend_id = r.user_id
)
SELECT * FROM friends;

Apache AGE (The clean way):

SELECT * FROM cypher('social_network', $$
  MATCH (u:Person {name: 'Alice'})-[:FOLLOWS*2]->(fof)
  RETURN fof.name
$$) as (fof_name agtype);

The Cypher version is intent-based. You describe the pattern, and AGE handles the traversal logic. In our production environment, this shift slashed our data access layer code by 40% and improved query readability overnight.

The Power of Hybrid Queries

The “aha!” moment for my team was realizing we could join standard relational tables with graph data. This is a massive advantage over standalone graph databases. You can keep your heavy transactional data in strict tables while offloading complex networking logic to the graph engine.

Imagine joining a standard orders table with a customer_connections graph to find influential buyers.

SELECT 
    u.customer_name, 
    o.total_amount
FROM orders o
JOIN (
    SELECT * FROM cypher('social_network', $$
        MATCH (c:Customer)-[:REFERRED]->(other)
        RETURN c.name as customer_name, count(other) as referral_count
    $$) as (customer_name agtype, referral_count agtype)
) AS u ON o.customer_name = u.customer_name::text
WHERE u.referral_count::int > 5;

Lessons from the Trenches

Moving a production workload to AGE taught us a few things the documentation misses. Here is what you need to watch out for.

1. Indexing is Mandatory

Graph traversals aren’t magic. AGE stores properties in a JSONB-like format called agtype. If you are matching nodes by an email or sku, create a GIN index on the underlying label table. Without it, your queries will fall back to slow full table scans.

2. Master the agtype Cast

Results come back as agtype. Your application drivers (Node.js or Python) might see these as strings or objects. Get used to explicit casting in your SQL wrappers, such as (result).property::int or ::text, to ensure your application gets the right data types.

3. Tuning Memory

Deep traversals eat RAM. We found that increasing work_mem from the default 4MB to 256MB was vital for 4-level deep MATCH operations. If your buffer cache hit ratio drops, your traversals will crawl.

4. Check Your Versions

AGE is picky about its host. It currently supports PostgreSQL 11 through 16, but specific AGE releases target specific PG versions. We had to upgrade a legacy PG 10 cluster just to get started, so check the compatibility matrix before you commit.

The Bottom Line

Apache AGE offers the best of both worlds. You get the ACID compliance of PostgreSQL and the flexible modeling of a graph database without the overhead of a multi-database stack. If your SQL joins are getting out of hand, it’s time to stop writing CTEs and start writing Cypher.