Why the Default Linux Scheduler Isn’t Always Your Friend
The Linux scheduler is a masterpiece of engineering. For 99% of workloads, it does a brilliant job of spreading tasks across all available cores to prevent any single CPU from becoming a bottleneck. However, if you are managing high-frequency trading platforms, low-latency gaming servers, or massive PostgreSQL databases, this automatic balancing can actually hurt you.
The problem is context switching. When the kernel moves a process from Core 0 to Core 1, that process leaves its “warm” L1 and L2 cache data behind. Fetching that data again from the L3 cache or system RAM adds roughly 50 to 200 nanoseconds of latency. While that sounds small, these micro-stalls add up quickly. By implementing CPU pinning and respecting NUMA (Non-Uniform Memory Access) boundaries, you can lock your critical apps to specific hardware threads.
Quick Start: Pinning a Process in 5 Minutes
The most straightforward tool for managing CPU affinity is taskset. It is part of the standard util-linux package and works on almost every modern distro.
Checking Current Affinity
To see which cores a running process is currently allowed to use, grab its PID and run:
taskset -p 1234
The output will likely be a hexadecimal mask. For example, f on a 4-core system means the process can run anywhere.
Launching a New Process on Specific Cores
Suppose you have a Python script that handles heavy data ingestion. To restrict it to cores 0 and 1, use the -c (cpu-list) flag:
taskset -c 0,1 python3 ingest_data.py
Changing Affinity for a Running Process
If your database is already under load and you want to move it to cores 2 and 3 to clear room for other tasks:
# Assuming the PID is 5678
taskset -cp 2,3 5678
The NUMA Factor: Why Cache Locality Matters
Modern multi-socket servers, like those powered by dual AMD EPYC or Intel Xeon chips, use NUMA architecture. In these systems, memory is not a single uniform pool. Instead, each CPU socket has its own local RAM. While CPU 0 can technically access memory attached to CPU 1, doing so requires traveling across the interconnect (like AMD’s Infinity Fabric), which introduces a significant performance penalty.
Visualizing Your Hardware Topology
Before pinning, you must identify which cores belong to which RAM bank. Run lscpu and look for the NUMA section:
lscpu | grep -i numa
You might see an output like this:
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
If you pin an application to Core 0 but its data resides in Node 1’s memory, you will see a massive drop in throughput. This is why numactl is essential.
Binding Memory and CPU Together
The numactl tool ensures your process stays on the same node for both compute and memory:
# Run the app on NUMA node 0 for both CPU and RAM
numactl --cpunodebind=0 --membind=0 ./my_high_perf_app
Advanced Isolation: Creating Dedicated Cpusets
While taskset is great for quick fixes, cpuset (via cgroups) allows you to partition your server like a pro. You can effectively “fence off” specific cores so the OS won’t use them for general system tasks, leaving them entirely at the disposal of your application.
Step 1: Setting up the cgroup
On systems using cgroup v2, you can create a dedicated group for your high-priority app. First, enable the controller:
mkdir /sys/fs/cgroup/production_app
echo "+cpuset" > /sys/fs/cgroup/cgroup.subtree_control
Step 2: Reserving the Hardware
Next, define which CPUs and memory nodes this group is allowed to touch:
echo "2-3" > /sys/fs/cgroup/production_app/cpuset.cpus
echo "0" > /sys/fs/cgroup/production_app/cpuset.mems
Step 3: Moving the Process
Simply write the process ID to the cgroup.procs file to move it into the isolated environment:
echo 1234 > /sys/fs/cgroup/production_app/cgroup.procs
Making Pinning Permanent with systemd
Manual commands are fine for testing, but production services should have affinity baked into their configuration. You can do this directly in the [Service] section of your systemd unit file.
Open your service file (e.g., /etc/systemd/system/redis.service) and add these lines:
[Service]
ExecStart=/usr/bin/redis-server
CPUAffinity=0 1
NUMAPolicy=bind
NUMAMask=0
Apply the changes with systemctl daemon-reload and restart your service. This ensures that even after a reboot, your service returns to its designated cores.
Lessons from the Field: Best Practices
After managing high-traffic Kubernetes clusters and bare-metal database nodes for years, I’ve found that pinning is a double-edged sword. It is easy to get over-excited and accidentally starve the OS of the resources it needs for networking interrupts or disk I/O.
1. Watch Your NUMA Misses
Use numastat -p <PID> to monitor performance. If the numa_miss counter is climbing, your application is reaching across the motherboard to fetch data from a remote RAM bank. This is a clear sign that your pinning strategy is misaligned with your hardware.
2. The Ultimate Isolation: isolcpus
If you have a process that cannot tolerate even a millisecond of interference, use the isolcpus kernel parameter. By editing /etc/default/grub and adding isolcpus=2,3, you tell the Linux kernel to never schedule anything on those cores by default. They will sit idle until you manually assign a process to them using taskset.
3. Beware of Hyper-threading
Don’t forget that Core 0 and Core 1 might be two logical threads sharing the same physical core. For tasks that are compute-bound, pinning two heavy threads to the same physical core will cause them to fight for the same execution units. Always check /sys/devices/system/cpu/cpu0/topology/thread_siblings_list to identify which logical IDs share physical hardware.

