Predicting CPU and RAM Load with Python and Scikit-learn: ML for System Resource Monitoring

Table of Contents

It Was 2 AM and the Server Was Already Dead

The alert fired at 1:47 AM. By the time I SSHed in, the application server had already OOM-killed three worker processes. CPU at 98%, RAM swapping hard. The root cause? A batch job that runs every night — the same one that had been running for months — suddenly needed twice the memory because the dataset crossed a threshold nobody was tracking.

We recovered in about 40 minutes. But what kept me up the rest of that night wasn’t “why did it crash.” It was “why didn’t we see it coming?”

That incident pushed me down a rabbit hole: what if we could predict resource exhaustion before it happens, instead of alerting after the damage is done?

The Real Problem: Reactive Monitoring Always Loses

Traditional monitoring works like this: CPU hits 90% → alert fires → engineer scrambles → service is already degraded. You’re always one step behind.

The data to predict that 2 AM crash was sitting in our metrics the whole time. CPU and RAM usage follow patterns — time of day, day of week, batch job schedules, traffic curves. A model trained on historical metrics can forecast “in 30 minutes, RAM will hit 85%” with reasonable accuracy. That gives you time to scale out, restart a leaky service, or defer a batch job before anything breaks.

Not theoretical. I’ve run this in production. False positive rates dropped ~60% compared to threshold-based alerting, and we caught two near-outages before users noticed a single slow request.

Three Approaches and Why I Chose the Simpler One

Before touching code, here’s an honest look at what I evaluated:

Option 1: Time-series forecasting (ARIMA, Prophet)

Facebook Prophet is popular for this use case. It handles seasonality well and requires no feature engineering. The downsides are real though: it’s overkill for per-minute server metrics, slow to train at scale, and the model output is hard to explain to ops teams when alerts start misbehaving.

Option 2: Deep learning (LSTM)

LSTMs can capture long-range dependencies in sequences. But they need large datasets, are finicky about hyperparameters, and need GPU time to train. For predicting CPU/RAM 15–30 minutes ahead, they don’t meaningfully beat simpler models on typical server workloads.

Option 3: Gradient Boosting with engineered time features

This is what stuck. Train a GradientBoostingRegressor on features extracted from the timestamp — hour, day of week, minute — plus a rolling window of recent usage. Fast to train, easy to retrain on a schedule, and explainable when something goes wrong. Works well with just a few weeks of data.

A simpler model that actually runs in production beats a sophisticated one that nobody maintains.

Building the Prediction Pipeline

Step 1: Collect metrics with psutil

Start with the data. If you don’t have a metrics stack yet, this script logs CPU and RAM every minute to a CSV:

import psutil
import csv
import time
from datetime import datetime

OUTPUT_FILE = "system_metrics.csv"

def collect_metrics(output_file):
    with open(output_file, "a", newline="") as f:
        writer = csv.writer(f)
        # Write header only for new files
        if f.tell() == 0:
            writer.writerow(["timestamp", "cpu_percent", "ram_percent", "ram_used_mb"])
        while True:
            ts = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
            cpu = psutil.cpu_percent(interval=1)
            ram = psutil.virtual_memory()
            writer.writerow([ts, cpu, ram.percent, ram.used // (1024 * 1024)])
            f.flush()
            time.sleep(60)

if __name__ == "__main__":
    collect_metrics(OUTPUT_FILE)

Run this as a background service. Two to four weeks of data is enough to train a useful model. Already on Prometheus? Export what you need:

curl -G 'http://localhost:9090/api/v1/query_range' \
  --data-urlencode 'query=100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)' \
  --data-urlencode 'start=2025-01-01T00:00:00Z' \
  --data-urlencode 'end=2025-03-01T00:00:00Z' \
  --data-urlencode 'step=60s' \
  | python3 -c "import sys,json; data=json.load(sys.stdin); print('\n'.join([f'{v[0]},{v[1]}' for r in data['data']['result'] for v in r['values']]))"

Step 2: Feature engineering

Raw timestamps mean nothing to a regression model. Extract features that capture the cyclical nature of server load:

import pandas as pd
import numpy as np

def engineer_features(df, target_col="cpu_percent", horizon_minutes=30):
    df = df.copy()
    df["timestamp"] = pd.to_datetime(df["timestamp"])
    df = df.sort_values("timestamp").reset_index(drop=True)

    # Time features
    df["hour"] = df["timestamp"].dt.hour
    df["minute"] = df["timestamp"].dt.minute
    df["day_of_week"] = df["timestamp"].dt.dayofweek
    df["is_weekend"] = (df["day_of_week"] >= 5).astype(int)

    # Cyclical encoding avoids the discontinuity between hour 23 and hour 0
    df["hour_sin"] = np.sin(2 * np.pi * df["hour"] / 24)
    df["hour_cos"] = np.cos(2 * np.pi * df["hour"] / 24)
    df["dow_sin"] = np.sin(2 * np.pi * df["day_of_week"] / 7)
    df["dow_cos"] = np.cos(2 * np.pi * df["day_of_week"] / 7)

    # Rolling statistics over 5, 15, and 30-minute windows
    for window in [5, 15, 30]:
        df[f"{target_col}_roll_mean_{window}"] = df[target_col].rolling(window).mean()
        df[f"{target_col}_roll_std_{window}"] = df[target_col].rolling(window).std()
        df[f"{target_col}_roll_max_{window}"] = df[target_col].rolling(window).max()

    # Lag features
    for lag in [1, 5, 15, 30]:
        df[f"{target_col}_lag_{lag}"] = df[target_col].shift(lag)

    # Target: actual value N minutes from now
    df["target"] = df[target_col].shift(-horizon_minutes)

    df = df.dropna()
    return df

Step 3: Train and evaluate the model

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_absolute_error
import joblib

def train_model(df, target_col="cpu_percent", horizon_minutes=30):
    feature_cols = [
        "hour_sin", "hour_cos", "dow_sin", "dow_cos", "is_weekend",
        f"{target_col}_roll_mean_5", f"{target_col}_roll_mean_15", f"{target_col}_roll_mean_30",
        f"{target_col}_roll_std_15", f"{target_col}_roll_max_30",
        f"{target_col}_lag_1", f"{target_col}_lag_5", f"{target_col}_lag_15",
    ]

    X = df[feature_cols]
    y = df["target"]

    # TimeSeriesSplit is non-negotiable — never shuffle time series data for validation
    tscv = TimeSeriesSplit(n_splits=5)
    mae_scores = []

    model = GradientBoostingRegressor(
        n_estimators=200,
        learning_rate=0.05,
        max_depth=4,
        subsample=0.8,
        random_state=42
    )

    for train_idx, val_idx in tscv.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        model.fit(X_train, y_train)
        preds = model.predict(X_val)
        mae_scores.append(mean_absolute_error(y_val, preds))

    print(f"Cross-validated MAE: {np.mean(mae_scores):.2f}% ± {np.std(mae_scores):.2f}%")

    # Retrain on full dataset for the production model
    model.fit(X, y)
    joblib.dump(model, f"model_cpu_{horizon_minutes}min.pkl")
    print(f"Model saved: model_cpu_{horizon_minutes}min.pkl")
    return model, feature_cols

A MAE of 5–8% on CPU prediction 30 minutes out is realistic and actionable. If you’re seeing 15%+, the most likely culprits are insufficient training data or irregular spikes that need extra context features — like a boolean flag for “is a batch job running right now?”

Step 4: Run predictions and alert

import joblib
import requests

def predict_and_alert(model, feature_cols, threshold=80.0, slack_webhook=None):
    df = pd.read_csv("system_metrics.csv")
    df = engineer_features(df)

    if len(df) < 1:
        return

    latest = df[feature_cols].iloc[[-1]]
    predicted_cpu = float(np.clip(model.predict(latest)[0], 0, 100))

    print(f"Predicted CPU in 30min: {predicted_cpu:.1f}%")

    if predicted_cpu >= threshold and slack_webhook:
        message = (
            f":warning: *Predicted CPU spike*: {predicted_cpu:.1f}% in ~30 minutes\n"
            f"Current CPU: {df['cpu_percent'].iloc[-1]:.1f}%\n"
            f"Action: Consider scaling out or deferring batch jobs."
        )
        requests.post(slack_webhook, json={"text": message})

if __name__ == "__main__":
    model = joblib.load("model_cpu_30min.pkl")
    feature_cols = [...]  # same list used during training
    predict_and_alert(model, feature_cols, threshold=80.0, slack_webhook="https://hooks.slack.com/...")

Schedule it with cron to run every 5 minutes:

*/5 * * * * /usr/bin/python3 /opt/monitoring/predict_and_alert.py >> /var/log/cpu_predictor.log 2>&1

What I Learned Running This in Production

Some lessons cost time to learn:

Retrain on a schedule. Traffic patterns shift — deployments, new features, seasonal load. I retrain weekly on the last 30 days. A model that’s 3 months stale will drift and start crying wolf constantly.
Don’t fire on a single prediction. Require the predicted value to stay above the threshold for 3 consecutive checks before alerting. One noisy reading is noise. Three in a row is a trend.
Model RAM and CPU separately. Their failure modes are different. RAM leaks grow monotonically — a slow creep that benefits from longer lag windows. CPU spikes are impulsive and mean-reverting. One model won’t capture both well.
Check feature importance after every retrain. Run model.feature_importances_. If rolling means dominate every other feature, you may be overfitting to short-term history and missing deeper weekly patterns.

This setup now runs on three of our servers. Since deploying it, zero surprise OOM events — every memory issue has been caught at least 20 minutes before it became critical. The 2 AM pages have basically stopped.

The whole pipeline — data collection, feature engineering, training, Slack alerts — fits in under 200 lines of Python. No Kubernetes, no GPU cluster, no data science team required. You need historical metrics, a free afternoon, and the discipline to retrain the model when it starts lying to you.