Mastering Sysstat on Linux: Using sar, iostat, and mpstat to Diagnose Intermittent Performance Issues

Table of Contents

It’s 2 AM and Your Server Was Slow Three Hours Ago

Your monitoring alert fired at 11 PM. Response times spiked. Users complained. By the time you SSH’d in, everything looked fine — CPU idle at 90%, load average normal. The incident passed, but you have no idea what caused it.

This is exactly where sysstat saves you. Unlike top or htop, sysstat doesn’t just show you what’s happening right now. It collects performance data continuously in the background — CPU usage, disk I/O, memory, network — and stores it historically. When something breaks at 11 PM and you look at it at 2 AM, you can still go back and see exactly what happened.

I’ve been burned by intermittent slowdowns on my production Ubuntu 22.04 server with 4GB RAM more times than I’d like to admit. Once I set up sysstat properly, I finally had the forensic data to pinpoint the cause — a cron job hammering disk I/O at specific intervals.

Context & Why: What Sysstat Actually Does

Sysstat is a collection of performance monitoring utilities for Linux. The three you’ll use most are:

sar (System Activity Reporter) — the main tool. Reports CPU, memory, I/O, network, and more from historical data files.
iostat — focused on CPU and disk I/O statistics. Great for identifying bottlenecked storage devices.
mpstat — per-CPU statistics. Helps you spot if one core is maxed out while others are idle (common with single-threaded processes).

The secret weapon here is sadc (System Activity Data Collector). It runs as a cron job every 10 minutes by default, writing binary data files to /var/log/sysstat/. These files are your time machine — query any time window from the past 28 days.

The diagnostic flow is simple: alert fires, you SSH in, everything looks fine. Then you pull up sar for the exact time window of the incident and see what was actually happening. No guessing.

Installation

On Debian/Ubuntu systems:

sudo apt update
sudo apt install sysstat -y

On RHEL/CentOS/AlmaLinux:

sudo dnf install sysstat -y

One gotcha: data collection is disabled by default on most Debian/Ubuntu distros after installation. You need to flip a switch:

# On Debian/Ubuntu — edit the sysstat config
sudo nano /etc/default/sysstat

Change ENABLED="false" to ENABLED="true", then save.

# Start and enable the sysstat service
sudo systemctl enable sysstat
sudo systemctl start sysstat

# Verify it's running
sudo systemctl status sysstat

On RHEL-based systems the service enables automatically after installation. Confirm by checking if the cron jobs exist:

ls /etc/cron.d/sysstat
cat /etc/cron.d/sysstat

Configuration

Adjusting Collection Interval

The default 10-minute interval is fine for casual use. For production servers dealing with spiky workloads, drop it to 2 minutes — you get much finer resolution during incidents without meaningful overhead.

Edit the cron file:

sudo nano /etc/cron.d/sysstat

Change the default entry from:

# Run system activity accounting tool every 10 minutes
*/10 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 1 1

To:

# Collect every 2 minutes
*/2 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 1 1

Keeping Data Longer

Default retention is 7 days. For production systems, 28 days gives you a full month of history — useful for spotting weekly patterns or revisiting incidents you didn’t investigate right away.

sudo nano /etc/sysstat/sysstat

Find and update:

HISTORY=28

What Gets Collected

Data lives in /var/log/sysstat/ as daily binary files named sa01, sa12, etc. (day of month). List them with:

ls -lh /var/log/sysstat/

The storage footprint is tiny. On my 4GB RAM Ubuntu server at 2-minute intervals, each daily file stays under 2MB. A full month of history costs less than 60MB — a small price for the ability to investigate any incident from the last four weeks.

Verification & Monitoring

Using sar to Replay an Incident

Classic scenario: something happened last night and you need to know what. Query sar with a time range:

# CPU usage for today between 10 PM and midnight
sar -u -s 22:00:00 -e 00:00:00

# Check a specific date (e.g., the 10th of this month)
sar -u -s 22:00:00 -e 00:00:00 -f /var/log/sysstat/sa10

The -u flag reports CPU utilization. Key columns to watch:

%user — user-space CPU usage
%system — kernel CPU usage (high values suggest I/O or syscall issues)
%iowait — time waiting for I/O (consistently above 20% is a red flag)
%idle — free CPU capacity

# Memory usage history
sar -r -s 22:00:00 -e 00:00:00

# Swap usage (critical for 4GB RAM servers)
sar -S -s 22:00:00 -e 00:00:00

Using iostat for Disk I/O Analysis

High %iowait in sar tells you something is blocking on disk. iostat tells you which device and how bad it is:

# Real-time: update every 2 seconds, show extended stats
iostat -x 2 5

# Focus on a specific device
iostat -x sda 2 5

The critical columns in iostat -x output:

await — average time (ms) for I/O requests. Above 100ms on HDD or 20ms on SSD is worth investigating.
%util — device utilization. Near 100% means the disk is saturated.
r/s and w/s — reads and writes per second.

For historical I/O data, go back to sar:

# Disk I/O history
sar -d -s 22:00:00 -e 00:00:00 -f /var/log/sysstat/sa10

Using mpstat for Per-CPU Analysis

Overall CPU at 40% doesn’t mean everything is fine. A single-threaded process can pin one core to 100% while the rest sit idle — and your server crawls. mpstat catches this:

# Show all CPUs, update every 2 seconds
mpstat -P ALL 2 5

On my production server, this revealed a PHP-FPM worker pinning CPU0 to 100% during peak traffic while CPU1, CPU2, and CPU3 sat idle. The fix was adjusting the PHP-FPM worker count and process affinity. I would never have found it with just top.

For historical per-CPU data:

# Historical CPU data broken down per core
sar -P ALL -s 22:00:00 -e 00:00:00 -f /var/log/sysstat/sa10

A Practical Incident Workflow

Five commands. That’s the actual sequence I run when investigating a past performance issue:

# Step 1: Find the spike — CPU overview for the whole day
sar -u -f /var/log/sysstat/sa10 | grep -v "^$"

# Step 2: Narrow the time window once you see the spike
sar -u -s 22:30:00 -e 23:30:00 -f /var/log/sysstat/sa10

# Step 3: Check memory — was RAM full, causing swap thrashing?
sar -r -s 22:30:00 -e 23:30:00 -f /var/log/sysstat/sa10

# Step 4: Check disk I/O — was something writing heavily?
sar -d -s 22:30:00 -e 23:30:00 -f /var/log/sysstat/sa10

# Step 5: Check network — was there a traffic spike?
sar -n DEV -s 22:30:00 -e 23:30:00 -f /var/log/sysstat/sa10

Quick Summary Report

Need everything at once? sar -A dumps the full daily report — CPU, memory, swap, I/O, network — in one scrollable output:

# Full report for a specific day's data file
sar -A -f /var/log/sysstat/sa10 | less

Useful for end-of-day review or when handing off an incident to a teammate who wasn’t online during it.

One Last Thing About Data Collection

Fresh install? You won’t have historical data yet — it takes time to build up the archive. Manually trigger the first collection to confirm everything is wired up correctly:

# Trigger immediate data collection
sudo /usr/lib/sysstat/debian-sa1 1 1

# Verify data is being written
ls -lh /var/log/sysstat/sa$(date +%d)

Non-zero file size means sysstat is collecting data correctly.

Running sar with no flags shows the current day’s CPU data — a quick sanity check before the next 2 AM incident hits.