Querying MySQL, MongoDB, and CSV Directly from PostgreSQL using FDW

Table of Contents

The Data Silo Headache

Six months ago, my team hit a wall. Our transactional data sat in PostgreSQL, but legacy user profiles were trapped in MySQL. Marketing kept dumping lead data into massive 500MB CSV files, while app logs lived in MongoDB. Creating a single unified report required a 400-line Python script that spent more time merging data in memory than actually analyzing it.

Managing these as isolated islands is exhausting. I needed a way to turn PostgreSQL into a central hub without the pain of a full-scale migration. That is why we implemented Foreign Data Wrappers (FDW) in our production environment. It changed everything.

What Exactly is PostgreSQL FDW?

Think of FDW as a bridge. It follows the SQL/MED (SQL Management of External Data) standard, allowing Postgres to treat external sources—SQL, NoSQL, or flat files—as local tables. When you query a foreign table, Postgres sends the request to the remote source, fetches the results, and handles the joining or filtering locally.

After six months in production, this setup has virtually eliminated our need for heavy ETL pipelines for daily reporting. It turns your database into a federated query engine.

Setting Up the Environment

You need specific extensions installed before connecting to external sources. While file_fdw is usually built-in, others like mysql_fdw or mongo_fdw typically require a quick install through your OS package manager.

-- On Ubuntu/Debian
sudo apt-get install postgresql-15-mysql-fdw
-- Then inside Postgres
CREATE EXTENSION mysql_fdw;
CREATE EXTENSION file_fdw;

Connecting to MySQL

The mysql_fdw extension is a workhorse. In our experience, it handles 10,000+ row joins with negligible overhead. Setup involves three straightforward steps: defining the server, mapping the user, and linking the table.

1. Define the Foreign Server

This tells PostgreSQL where the MySQL instance lives. Be sure to use the internal IP to keep latency low.

CREATE SERVER mysql_legacy_server
FOREIGN DATA WRAPPER mysql_fdw
OPTIONS (host '192.168.1.50', port '3306');

2. Create User Mapping

PostgreSQL needs credentials to access MySQL. I recommend creating a dedicated read-only user in MySQL to limit security risks.

CREATE USER MAPPING FOR postgres
SERVER mysql_legacy_server
OPTIONS (username 'fdw_user', password 'secure_password');

3. Define the Foreign Table

You can pull an entire schema at once, but I prefer defining specific tables. It keeps your local schema clean and intentional.

CREATE FOREIGN TABLE mysql_users (
    id INT,
    username TEXT,
    email TEXT
)
SERVER mysql_legacy_server
OPTIONS (dbname 'user_db', table_name 'users');

Integrating MongoDB Data

Mapping a schema-less BSON document to a structured SQL table is slightly more complex. We used this to link our Postgres ‘orders’ table with MongoDB ‘activity_logs’ for fraud detection.

CREATE EXTENSION mongo_fdw;

CREATE SERVER mongo_server
FOREIGN DATA WRAPPER mongo_fdw
OPTIONS (address '127.0.0.1', port '27017');

CREATE USER MAPPING FOR postgres
SERVER mongo_server
OPTIONS (username 'admin', password 'password');

CREATE FOREIGN TABLE mongo_logs (
    _id NAME,
    user_id INT,
    action TEXT,
    timestamp TIMESTAMP
)
SERVER mongo_server
OPTIONS (db 'logs_db', collection 'user_actions');

Performance tip: complex MongoDB aggregations don’t always push down perfectly. If a query takes more than a few seconds, check if Mongo is doing the heavy lifting or if Postgres is fetching the entire collection to filter it locally.

Reading CSV Files Directly

Data analysts love file_fdw. Instead of manually running COPY FROM every time a new spreadsheet arrives, you just point a foreign table at the file path. If the CSV is updated, your SQL results refresh instantly.

CREATE SERVER csv_files_server FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE marketing_leads (
    lead_name TEXT,
    lead_email TEXT,
    source TEXT
)
SERVER csv_files_server
OPTIONS (filename '/var/lib/postgresql/data/leads.csv', format 'csv', header 'true');

The Power of Cross-Database JOINs

The real payoff is the ‘Mega-Join.’ We can now link a MySQL user to a CSV lead list and filter by MongoDB logs in a single query. It looks like this:

SELECT 
    u.username, 
    m.lead_name, 
    l.action
FROM mysql_users u
JOIN marketing_leads m ON u.email = m.lead_email
JOIN mongo_logs l ON u.id = l.user_id
WHERE l.timestamp > NOW() - INTERVAL '1 day';

This query pulls data from three different environments simultaneously. No Python. No Spark. Just 10 lines of SQL.

Real-world Performance and Lessons Learned

After six months of running FDW in production, here are the hard truths. Network latency is the silent killer. If your MySQL instance is in a different region, expect a 100ms penalty on every request. Keep your databases in the same VPC whenever possible.

Always check your EXPLAIN plans. Postgres tries to ‘push down’ filters like WHERE id = 10 to the remote server, which is fast. However, if it can’t translate a function, it might pull 1 million rows over the wire just to filter them locally.

Security requires extra care. Storing credentials in CREATE USER MAPPING makes them visible in system catalogs. Use vaulted secrets or environment variables for production environments. Finally, remember that if the remote server goes down, your Postgres query will hang. Configure strict timeouts in your CREATE SERVER options to prevent one slow system from crashing your entire dashboard.

FDW is not a replacement for a Petabyte-scale Data Warehouse. But for operational reporting and avoiding messy ETL scripts, it is an essential tool. It lets you leverage the full power of SQL across your entire infrastructure with almost zero overhead.