Scaling IoT to Billions of Points: A 6-Month InfluxDB Field Report

Database tutorial - IT technology blog
Database tutorial - IT technology blog

The Breaking Point: When SQL Hits the Wall

Last year, our IoT platform hit a performance wall that no amount of indexing could fix. We were managing 1,200 industrial sensors, each pumping out temperature, humidity, and vibration metrics every single second. That is 72,000 writes per minute, or over 100 million new rows every day. Initially, we threw everything into a standard PostgreSQL instance running on a 16GB RAM cloud server. It held up for the first month, but then the wheels fell off.

Dashboards that used to snap into view suddenly hung for 20 seconds. Simple math—like finding the average vibration peak over the last hour—pinned the database CPU at 100%, effectively locking out other users. While I love PostgreSQL for relational data, I realized we were trying to use a luxury sedan to haul 50 tons of gravel. We needed a specialized industrial tool built for time-series ingestion.

Why Your Standard Database is Choking

It isn’t that MySQL or Postgres are “slow.” The issue is their DNA. Traditional databases use B-Trees for indexing. This works great for looking up a specific user ID, but high-frequency IoT writes create massive index fragmentation. Every incoming sensor reading forces the database to re-sort and update the index on disk, leading to an IOPS death spiral.

Data retention is the other silent killer. In the IoT world, you rarely need millisecond-level precision for data that is two years old. However, running a DELETE command on a billion-row SQL table is essentially a self-inflicted DDoS attack. It locks the table, bloats the transaction logs, and often fails to actually free up disk space because of how the underlying files are structured.

Picking the Right Tool for the Job

Before pulling the trigger on a migration, I benchmarked three different architectures:

  • TimescaleDB: A solid Postgres extension. It’s the best choice if your metrics are deeply tied to complex relational tables. However, the storage footprint remains relatively high compared to pure time-series engines.
  • MongoDB: Great for flexible schemas, but it’s a memory hog for time-series. It lacks native functions for things like derivatives or moving averages, forcing you to do the heavy lifting in your application code.
  • InfluxDB: This was our winner. It uses a Time-Structured Merge Tree (TSM) engine, which is architected specifically for high-velocity writes and aggressive data compression.

We chose InfluxDB because the “Telegraf” agent handled our ingestion pipeline out of the box, and the compression promised to save us thousands in monthly storage costs.

Setting Up the Foundation

Getting a production-ready instance up takes minutes with Docker. We deployed the InfluxDB 2.7 image, which bundles the UI, storage engine, and task scheduler into one container.

docker run -d -p 8086:8086 \
  --name influxdb_production \
  -v /var/lib/influxdb2:/var/lib/influxdb2 \
  influxdb:2.7

After hitting http://localhost:8086, you’ll set up an Organization and a Bucket. In InfluxDB, a Bucket isn’t just a database; it’s a container with a built-in expiration date. You decide on day one how long the data should live before it’s automatically purged.

Telegraf: The Secret to Stability

Don’t waste time writing custom Python or Node.js scripts to push data. We used Telegraf, a lightweight agent that handles the “messy” parts of networking. It batches points together, handles retries if the database is busy, and parses MQTT streams natively.

Here is the config snippet we used to bridge our MQTT broker to InfluxDB:

[[outputs.influxdb_v2]]
  urls = ["http://influxdb:8086"]
  token = "${INFLUX_TOKEN}"
  organization = "industrial-iot"
  bucket = "raw_sensor_data"

[[inputs.mqtt_consumer]]
  servers = ["tcp://broker.internal:1883"]
  topics = ["factory/+/metrics"]
  data_format = "json"

This approach removed the burden of error handling from our firmware team. If the database goes down for maintenance, Telegraf simply buffers the data in memory until the connection returns.

The Flux Query Language: Power vs. Pain

InfluxDB 2.x uses Flux, a functional query language. If you are used to SELECT * FROM, Flux feels like learning to write code backward. It uses pipes (|>) to pass data through filters and transformations. It’s weird at first, but incredibly powerful for analytics.

To calculate a 5-minute moving average for a specific sensor, the code looks like this:

from(bucket: "raw_sensor_data")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "vibration")
  |> filter(fn: (r) => r["machine_id"] == "CNC-04")
  |> aggregateWindow(every: 5m, fn: mean)
  |> yield(name: "smooth_average")

Doing this in SQL requires complex Window Functions or subqueries. In Flux, it’s a readable linear pipeline.

Downsampling: How We Saved 85% on Disk

The real magic happened when we automated our data lifecycle. We didn’t need per-second vibration data from three months ago. We only needed the trends. We set up a task to “downsample” our data into different tiers.

  1. High-Res Bucket: Per-second data, auto-deleted after 7 days.
  2. Archive Bucket: 15-minute averages, kept for 2 years.

Our raw data was eating roughly 40GB of disk space per day. After implementing this 15-minute downsampling task, our long-term storage growth slowed to just 150MB per week. That is a massive reduction in infrastructure overhead without losing the historical insights our business team needed.

The 6-Month Verdict

Moving to a dedicated time-series database was a turning point for our reliability. Our query times dropped from agonizing seconds to sub-100ms responses. More importantly, we stopped worrying about the database crashing as we added more sensors. If your project involves logs, financial tickers, or IoT metrics, stop forcing it into a relational mold. Use a tool designed to handle the pressure.

Share: