Fixing ‘Too many open files’ on Linux: A Guide to File Descriptors and ulimit

Linux tutorial - IT technology blog
Linux tutorial - IT technology blog

Production Lessons: Scaling Linux Resource Limits

Our team recently migrated a microservices cluster handling roughly 15,000 requests per second. For the first few months, the system was stable. Then, during a 40% traffic spike at 2:00 AM on a Friday, the monitoring alerts went off.

The logs were hit with java.io.IOException: Too many open files. Nginx stopped accepting new TCP connections, and our PostgreSQL database began dropping queries. If you manage high-concurrency stacks like Node.js, Go, or Elasticsearch, this bottleneck is almost inevitable.

Linux architecture relies on the philosophy that nearly everything is a file. A network socket, a system log, and a database index all consume a File Descriptor (FD).

By default, many Linux distributions set conservative limits—often just 1,024 FDs per process—to protect system resources from runaway scripts. In a modern production environment, these defaults are a liability. After months of fine-tuning these systems across 12 different clusters, I’ve mapped out the exact workflow to move from fragile defaults to a hardened, high-performance configuration.

The Three Layers of Resource Management

Troubleshooting revealed three distinct ways to handle resource limits. Choosing the wrong layer is the primary reason developers find their changes “disappearing” after a system reboot or a service restart.

1. The Shell Session (ulimit command)

This is the quick-fix method found in most forum posts. Running ulimit -n 65535 in your terminal works instantly. However, the change only applies to that specific shell session and its child processes. Once you log out or the server reboots, the limit resets to the default 1024. It is perfect for a quick test, but useless for permanent production fixes.

2. User-level Configuration (/etc/security/limits.conf)

This is the traditional approach for defining hard and soft limits for specific users or groups. While it works for interactive SSH sessions, it is a common trap. Modern services managed by systemd ignore limits.conf entirely. If you are trying to scale an Nginx service by editing this file, your changes will never take effect.

3. Service-level Overrides (systemd Unit Files)

For modern distributions like Ubuntu 22.04+, Debian, or AlmaLinux, systemd is the source of truth. To change limits for a service like Nginx or MySQL, you must use a service override. This is the most surgical and reliable method for production because it stays tied to the service regardless of how it is started.

Comparing Management Methods

Method Pros Cons
ulimit command Instant, requires no root for lowering limits. Lost on logout; not persistent.
limits.conf Great for multi-user environments and developers. Ignored by systemd; requires a new login session.
systemd overrides Standard for modern services; survives reboots. Requires a daemon-reload; service-specific.
sysctl (kernel) Sets the absolute ceiling for the entire OS. Won’t help if per-process limits remain low.

A Proven Production Strategy

The most stable strategy is a hybrid approach. You must raise the kernel-level ceiling first, set reasonable defaults for human users, and then explicitly grant high limits to your heavy-hitter applications. In my experience managing over 50 VPS instances, I’ve learned to always verify limits from the perspective of the running process. I once lost three hours of uptime because a “Hard” limit was set lower than a “Soft” limit, causing the OS to silently discard the entire configuration.

The Recommended Setup:

  • Kernel: Set fs.file-max to 2,097,152 (2 million) to ensure the hardware can handle the total global load.
  • Systemd: Apply LimitNOFILE=65535 to every high-traffic service unit.
  • Users: Set a 10,000 FD limit in limits.conf for developers to prevent a single buggy script from crashing the whole node.

Implementation Guide

Step 1: Diagnostic Commands

Don’t guess; check the /proc filesystem. If Nginx is running, you can see its live limits regardless of what the config files say:

# Get the live limits for the Nginx process
cat /proc/$(pidof nginx | awk '{print $1}')/limits | grep "Max open files"

To see how many files a process currently has open, use lsof. If this number is close to 1024, you are in the danger zone:

# Count active file descriptors for a specific PID
lsof -n -p <PID> | wc -l

Step 2: Raising Kernel Limits

If the system-wide limit is too low, no process-level tuning will work. Check your current global max:

cat /proc/sys/fs/file-max

To increase this persistently to 2 million, edit /etc/sysctl.conf:

# Add to /etc/sysctl.conf
fs.file-max = 2097152

# Load the new setting immediately
sysctl -p

Step 3: The Systemd Override (The Modern Standard)

For services like Nginx or Redis, create an override file. This is cleaner than editing the main service file, which might be overwritten during a package update.

# Opens the override editor for Nginx
systemctl edit nginx

Paste the following block:

[Service]
LimitNOFILE=65535

Apply the changes by reloading the daemon and restarting the service:

systemctl daemon-reload
systemctl restart nginx

Step 4: User Limits for Scripts

If you have a deploy user that runs manual migration scripts, update /etc/security/limits.conf:

# /etc/security/limits.conf
deploy         soft      nofile         10000
deploy         hard      nofile         65535

Note: Users can increase their “Soft” limit up to the “Hard” limit on their own. The “Hard” limit acts as a ceiling that only root can change.

Summary

Effective resource management is about visibility. After six months of running these optimized settings, our “Too many open files” errors disappeared entirely. My biggest takeaway? Never trust a config file—trust the output of /proc/[PID]/limits. If that file shows 1024, your process is stuck at 1024. By combining kernel-level tuning with systemd overrides, you build a server environment that stays upright even during sudden traffic spikes.

Share: