Linux CPU Pinning: How to Squeeze Every Drop of Performance from Your Hardware

Linux tutorial - IT technology blog
Linux tutorial - IT technology blog

Why the Default Linux Scheduler Isn’t Always Your Friend

The Linux scheduler is a masterpiece of engineering. For 99% of workloads, it does a brilliant job of spreading tasks across all available cores to prevent any single CPU from becoming a bottleneck. However, if you are managing high-frequency trading platforms, low-latency gaming servers, or massive PostgreSQL databases, this automatic balancing can actually hurt you.

The problem is context switching. When the kernel moves a process from Core 0 to Core 1, that process leaves its “warm” L1 and L2 cache data behind. Fetching that data again from the L3 cache or system RAM adds roughly 50 to 200 nanoseconds of latency. While that sounds small, these micro-stalls add up quickly. By implementing CPU pinning and respecting NUMA (Non-Uniform Memory Access) boundaries, you can lock your critical apps to specific hardware threads.

Quick Start: Pinning a Process in 5 Minutes

The most straightforward tool for managing CPU affinity is taskset. It is part of the standard util-linux package and works on almost every modern distro.

Checking Current Affinity

To see which cores a running process is currently allowed to use, grab its PID and run:

taskset -p 1234

The output will likely be a hexadecimal mask. For example, f on a 4-core system means the process can run anywhere.

Launching a New Process on Specific Cores

Suppose you have a Python script that handles heavy data ingestion. To restrict it to cores 0 and 1, use the -c (cpu-list) flag:

taskset -c 0,1 python3 ingest_data.py

Changing Affinity for a Running Process

If your database is already under load and you want to move it to cores 2 and 3 to clear room for other tasks:

# Assuming the PID is 5678
taskset -cp 2,3 5678

The NUMA Factor: Why Cache Locality Matters

Modern multi-socket servers, like those powered by dual AMD EPYC or Intel Xeon chips, use NUMA architecture. In these systems, memory is not a single uniform pool. Instead, each CPU socket has its own local RAM. While CPU 0 can technically access memory attached to CPU 1, doing so requires traveling across the interconnect (like AMD’s Infinity Fabric), which introduces a significant performance penalty.

Visualizing Your Hardware Topology

Before pinning, you must identify which cores belong to which RAM bank. Run lscpu and look for the NUMA section:

lscpu | grep -i numa

You might see an output like this:

NUMA node0 CPU(s):   0-7,16-23
NUMA node1 CPU(s):   8-15,24-31

If you pin an application to Core 0 but its data resides in Node 1’s memory, you will see a massive drop in throughput. This is why numactl is essential.

Binding Memory and CPU Together

The numactl tool ensures your process stays on the same node for both compute and memory:

# Run the app on NUMA node 0 for both CPU and RAM
numactl --cpunodebind=0 --membind=0 ./my_high_perf_app

Advanced Isolation: Creating Dedicated Cpusets

While taskset is great for quick fixes, cpuset (via cgroups) allows you to partition your server like a pro. You can effectively “fence off” specific cores so the OS won’t use them for general system tasks, leaving them entirely at the disposal of your application.

Step 1: Setting up the cgroup

On systems using cgroup v2, you can create a dedicated group for your high-priority app. First, enable the controller:

mkdir /sys/fs/cgroup/production_app
echo "+cpuset" > /sys/fs/cgroup/cgroup.subtree_control

Step 2: Reserving the Hardware

Next, define which CPUs and memory nodes this group is allowed to touch:

echo "2-3" > /sys/fs/cgroup/production_app/cpuset.cpus
echo "0" > /sys/fs/cgroup/production_app/cpuset.mems

Step 3: Moving the Process

Simply write the process ID to the cgroup.procs file to move it into the isolated environment:

echo 1234 > /sys/fs/cgroup/production_app/cgroup.procs

Making Pinning Permanent with systemd

Manual commands are fine for testing, but production services should have affinity baked into their configuration. You can do this directly in the [Service] section of your systemd unit file.

Open your service file (e.g., /etc/systemd/system/redis.service) and add these lines:

[Service]
ExecStart=/usr/bin/redis-server
CPUAffinity=0 1
NUMAPolicy=bind
NUMAMask=0

Apply the changes with systemctl daemon-reload and restart your service. This ensures that even after a reboot, your service returns to its designated cores.

Lessons from the Field: Best Practices

After managing high-traffic Kubernetes clusters and bare-metal database nodes for years, I’ve found that pinning is a double-edged sword. It is easy to get over-excited and accidentally starve the OS of the resources it needs for networking interrupts or disk I/O.

1. Watch Your NUMA Misses

Use numastat -p <PID> to monitor performance. If the numa_miss counter is climbing, your application is reaching across the motherboard to fetch data from a remote RAM bank. This is a clear sign that your pinning strategy is misaligned with your hardware.

2. The Ultimate Isolation: isolcpus

If you have a process that cannot tolerate even a millisecond of interference, use the isolcpus kernel parameter. By editing /etc/default/grub and adding isolcpus=2,3, you tell the Linux kernel to never schedule anything on those cores by default. They will sit idle until you manually assign a process to them using taskset.

3. Beware of Hyper-threading

Don’t forget that Core 0 and Core 1 might be two logical threads sharing the same physical core. For tasks that are compute-bound, pinning two heavy threads to the same physical core will cause them to fight for the same execution units. Always check /sys/devices/system/cpu/cpu0/topology/thread_siblings_list to identify which logical IDs share physical hardware.

Share: