PostgreSQL Database Design Patterns: Polymorphic Association vs EAV vs JSONB for Flexible Schema

Database tutorial - IT technology blog
Database tutorial - IT technology blog

The Schema Problem That Hit Us at 2 AM

The alert fired at 2:14 AM. Our product team had just shipped a feature letting users attach custom attributes to orders — size, color, material, warranty period — and the query handling it was timing out at 30 seconds. The table had grown to 40 million rows, and we were using EAV. Classic mistake.

I’ve worked with MySQL, PostgreSQL, and MongoDB across enough projects to have hit this wall more than once. In PostgreSQL specifically, the choice between Polymorphic Association, EAV, and JSONB is one you’ll feel for years. Pick the wrong pattern early and you’re rewriting queries under production pressure. Get it right and it fades into the background.

Everything here comes directly from that 2 AM incident and the refactor that followed.

The Three Approaches at a Glance

Three patterns. Here’s the quick version:

  • Polymorphic Association: A single table references multiple parent tables using a type column + foreign key. Common in Rails-style ORMs.
  • EAV (Entity-Attribute-Value): Store dynamic attributes as rows — each attribute gets its own row with a name/value pair. Very flexible, notoriously painful to query.
  • JSONB: Store dynamic attributes as a JSON blob in a single column. PostgreSQL’s native approach, with indexing support.

Approach Comparison

Polymorphic Association

This pattern appears when a single table needs to reference multiple parent tables. The classic example: comments that can belong to either a post or a video. You add a commentable_type and commentable_id column and call it a day.

-- Polymorphic Association setup
CREATE TABLE comments (
  id          SERIAL PRIMARY KEY,
  body        TEXT NOT NULL,
  commentable_type VARCHAR(50) NOT NULL,  -- 'post' or 'video'
  commentable_id   INTEGER NOT NULL
);

-- No real foreign key constraint here — that's the problem
CREATE INDEX idx_comments_poly ON comments(commentable_type, commentable_id);

No real foreign key constraint — that’s the first red flag. PostgreSQL cannot enforce referential integrity across multiple tables with a single column pair. You’re trading database guarantees for flexibility.

EAV (Entity-Attribute-Value)

EAV feels like the right answer when attribute sets vary wildly across records. A physical product has color, a digital one has download_limit, a subscription has billing_cycle. So you dump everything into a key-value table and move on.

-- EAV setup
CREATE TABLE product_attributes (
  id         SERIAL PRIMARY KEY,
  product_id INTEGER NOT NULL REFERENCES products(id),
  attr_name  VARCHAR(100) NOT NULL,
  attr_value TEXT
);

CREATE INDEX idx_product_attr ON product_attributes(product_id, attr_name);

-- Querying a product with multiple attributes is painful
SELECT
  p.name,
  MAX(CASE WHEN pa.attr_name = 'color' THEN pa.attr_value END) AS color,
  MAX(CASE WHEN pa.attr_name = 'size'  THEN pa.attr_value END) AS size
FROM products p
LEFT JOIN product_attributes pa ON pa.product_id = p.id
GROUP BY p.id, p.name;

That pivot query caused our 2 AM incident. At 40M attribute rows, the GROUP BY + MAX(CASE WHEN) combination was destroying performance — and this was a simple query. Filtering by two attributes simultaneously was worse.

JSONB

PostgreSQL’s JSONB type stores JSON in a binary format, which makes reads fast and enables GIN indexing on keys and values. It’s the database’s own answer to flexible schema.

-- JSONB setup
CREATE TABLE products (
  id         SERIAL PRIMARY KEY,
  name       VARCHAR(255) NOT NULL,
  attributes JSONB DEFAULT '{}'
);

-- GIN index enables fast key/value lookups
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);

-- Insert a product with dynamic attributes
INSERT INTO products (name, attributes)
VALUES (
  'Mechanical Keyboard',
  '{"color": "black", "switch_type": "Cherry MX Red", "backlit": true, "warranty_years": 2}'
);

-- Query by a specific attribute value
SELECT name, attributes->>'color' AS color
FROM products
WHERE attributes @> '{"switch_type": "Cherry MX Red"}';

That last query hits the GIN index. On 2M rows, it runs in under 5ms on modest hardware.

Pros and Cons

Polymorphic Association

  • Pro: Simple to set up, maps naturally to ORM relationships (ActiveRecord, Django GenericForeignKey)
  • Pro: Works well when the set of parent types is small and stable
  • Con: No database-level referential integrity — orphaned rows are a real risk
  • Con: Joins get messy when you need data from the parent table
  • Con: Poor fit for PostgreSQL’s constraint system — you’re doing the ORM’s job, not the database’s

EAV

  • Pro: Maximum flexibility — any attribute, any entity, no schema migration required
  • Pro: Used in production systems like Magento and old-school WordPress meta tables
  • Con: Querying multiple attributes at once is genuinely painful — the SQL gets ugly fast
  • Con: No type enforcement — everything is TEXT, so numeric comparisons break silently
  • Con: Table bloat is severe — one product with 10 attributes becomes 10 rows instead of 1
  • Con: Indexes help but don’t solve the pivot query problem at scale

JSONB

  • Pro: Native PostgreSQL support with GIN indexing on key/value pairs
  • Pro: Flexible without the row explosion of EAV
  • Pro: Handles nested structures, arrays, and boolean values natively
  • Pro: jsonb_set(), jsonb_strip_nulls() and other built-in functions cover most update scenarios
  • Con: Column-level constraints don’t apply inside JSON — no NOT NULL, CHECK, or foreign keys per key
  • Con: Complex queries on deeply nested data get verbose quickly
  • Con: If every query filters on a JSONB key, that key belongs in a real column

Recommended Setup

My rule after that incident: start with JSONB, promote to a real column when a field becomes a first-class filter.

More specifically:

  • Use JSONB when attributes are truly dynamic and sparse — different records have completely different sets of keys. Product catalogs, user preferences, and feature flags are all good fits.
  • Use Polymorphic Association only for relationship-heavy features (comments, attachments, activity feeds) — and document the missing FK constraints explicitly so the next engineer isn’t surprised.
  • Avoid EAV in PostgreSQL unless you’re forced into it by a legacy migration. There’s almost always a better option.

Once a JSONB key starts showing up in 80%+ of your WHERE clauses, pull it out into a real column:

-- Promote a hot JSONB key to a real column
ALTER TABLE products ADD COLUMN color VARCHAR(50)
  GENERATED ALWAYS AS (attributes->>'color') STORED;

CREATE INDEX idx_products_color ON products(color);

This keeps the flexibility of JSONB while giving the query planner a real index to work with.

Implementation Guide

Setting Up a Hybrid JSONB Schema

Stable fields that show up in every WHERE clause get real columns. Everything else goes into JSONB. That’s the whole pattern.

CREATE TABLE products (
  id           SERIAL PRIMARY KEY,
  name         VARCHAR(255) NOT NULL,
  category_id  INTEGER NOT NULL REFERENCES categories(id),
  price        NUMERIC(10, 2) NOT NULL,
  status       VARCHAR(20) NOT NULL DEFAULT 'active',
  attributes   JSONB NOT NULL DEFAULT '{}',
  created_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- GIN index for arbitrary key lookups
CREATE INDEX idx_products_attrs_gin ON products USING GIN(attributes);

-- Regular index for known hot keys
CREATE INDEX idx_products_attr_brand
  ON products ((attributes->>'brand'));

Safe JSONB Updates in Python

One mistake I see constantly: overwriting the entire JSONB column when only one key needs changing. Use jsonb_set() instead:

import psycopg2
import json

conn = psycopg2.connect("dbname=mydb user=postgres")
cur = conn.cursor()

product_id = 42
new_color = "midnight blue"

# Safe: only updates the 'color' key inside attributes
cur.execute("""
    UPDATE products
    SET attributes = jsonb_set(attributes, '{color}', %s::jsonb)
    WHERE id = %s
""", (json.dumps(new_color), product_id))

conn.commit()
cur.close()
conn.close()

Validating JSONB Schema with CHECK Constraints

Column-level type enforcement doesn’t work inside JSONB, but you can enforce required keys with a CHECK constraint:

ALTER TABLE products
  ADD CONSTRAINT chk_attributes_required_keys
  CHECK (
    (attributes ? 'brand') AND
    (jsonb_typeof(attributes->'price_override') IN ('number', 'null'))
  );

Not as tight as typed columns, but better than a completely unconstrained blob. Useful when you need a contract between teams without locking down the full schema.

Migrating from EAV to JSONB

Inheriting an EAV table? Here’s the exact migration path we ran after that 2 AM incident:

-- Step 1: Aggregate EAV rows into JSON per entity
CREATE TABLE product_attributes_jsonb AS
SELECT
  product_id,
  jsonb_object_agg(attr_name, attr_value) AS attributes
FROM product_attributes
GROUP BY product_id;

-- Step 2: Add the JSONB column to products
ALTER TABLE products ADD COLUMN attributes JSONB DEFAULT '{}';

-- Step 3: Backfill
UPDATE products p
SET attributes = paj.attributes
FROM product_attributes_jsonb paj
WHERE paj.product_id = p.id;

-- Step 4: Add GIN index
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);

-- Step 5: Verify, then drop the old EAV table
-- DROP TABLE product_attributes;

After the migration, our slowest query dropped from 30 seconds to 180ms. No more 2 AM alerts from that table.

Final Decision Framework

When I’m designing a new feature, I run through this checklist:

  1. Are the attributes known upfront and queried frequently? → Use real columns.
  2. Are the attributes dynamic but always attached to one entity type? → Use JSONB.
  3. Do you need a relationship between a single entity and multiple parent tables? → Use Polymorphic Association, but document the missing FK constraints explicitly.
  4. Are you tempted to use EAV? → Use JSONB instead. Seriously.

PostgreSQL’s JSONB support is mature. EAV is almost never the right call for new systems. The indexing works. The query syntax is reasonable. The built-in functions handle the common cases. JSONB gives you the flexibility you need without abandoning the relational guarantees that make PostgreSQL worth running in the first place.

Share: