XDP eBPF on Linux: Building a High-Performance Packet Filter and Load Balancer with Microsecond Latency

Networking tutorial - IT technology blog
Networking tutorial - IT technology blog

Six Months with XDP in Production: What Actually Changed

When I swapped our iptables-based DDoS filter for an XDP program on a 10 Gbps edge server, the team’s first reaction was skepticism. eBPF had been on my radar for years, but XDP felt like overkill for a mid-sized setup. Six months later, it is the best infrastructure call I have made on this stack — rock solid through traffic spikes that used to trigger emergency rate limiting or upstream nullrouting.

This article covers how XDP compares to traditional packet processing, where it actually wins (and where it does not), how to get a working environment up, and the implementation of both a packet filter and a simple Layer 4 load balancer.

Approach Comparison: XDP vs iptables vs DPDK

Before touching code, it helps to understand where each tool sits in the Linux networking stack. The architecture difference is what explains the performance gap.

iptables / nftables

iptables processes packets inside the kernel’s netfilter framework. By the time a packet reaches your rules, the kernel has already allocated an sk_buff (socket buffer), copied the packet into kernel memory, and walked through several network stack layers.

Every packet also passes through conntrack, whether or not you use stateful rules. At 1 million packets per second, that overhead compounds quickly — CPU usage climbs, and you should expect latency in the 60–120 microsecond range even for simple DROP rules.

DPDK (Data Plane Development Kit)

DPDK bypasses the kernel entirely, polling the NIC directly from userspace. Throughput is impressive — line rate on 100 Gbps cards is achievable. The catch: you dedicate full CPU cores to polling, and DPDK takes exclusive ownership of the NIC.

Your normal Linux networking stack stops working on that interface. Writing DPDK code also means managing memory pools, mbuf structures, and ring buffers yourself. In practice, that means a specialist team, careful capacity planning, and a separate management interface just to SSH in.

XDP (eXpress Data Path)

XDP runs an eBPF program at the earliest possible point in the kernel’s receive path — directly inside the NIC driver, before any sk_buff is allocated. Packets that should be dropped never touch the rest of the stack. Packets that need to be redirected or modified get processed with zero copy.

Unlike DPDK, XDP coexists with the normal Linux network stack. Interfaces stay visible to the OS. You can still SSH into the machine, and tools like ip, ss, and tcpdump keep working on non-intercepted traffic.

Benchmark numbers from my setup (Intel X710, Xeon E-2288G, Debian 12):

Tool          | 64-byte UDP flood | CPU (1 core) | Latency (p99)
-----------   | ----------------- | ------------ | -------------
iptables DROP | ~800 Kpps         | 95%          | 80–120 µs
nftables DROP | ~1.1 Mpps         | 88%          | 60–90 µs
XDP DROP      | ~14 Mpps          | 12%          | 1–3 µs

That 14 million packets per second DROP rate on a single core, at under 3 microseconds p99 latency, is not a lab artifact. Those numbers came from xdp-bench drop during a real UDP amplification attack that peaked at 9 Mpps sustained.

Pros and Cons

Why XDP is worth the investment

  • Kernel integration without kernel modules — eBPF programs are verified and loaded safely; no kernel recompile needed.
  • Normal Linux networking still works — unlike DPDK, your SSH session survives.
  • eBPF maps for live state updates — add IPs to a blocklist, adjust backend weights, read per-CPU counters, all from userspace while the XDP program keeps running. Zero restarts.
  • Multiple attach modes: native (driver-level, fastest), offloaded (the NIC itself runs the program), or generic (software fallback, works on any NIC, slower).
  • Composable — chain multiple programs using libxdp’s program dispatcher.

Where XDP falls short

  • Learning curve is steep — eBPF C is restricted: no unbounded loops, no dynamic memory allocation, 512-byte stack limit. The verifier rejects programs in ways that feel opaque until you have a few hours of errors under your belt.
  • Native mode requires driver support — Intel, Mellanox, and Broadcom mainstream NICs are fine, but older or budget hardware falls back to generic mode, which is slower.
  • Debugging is harder than iptablesbpf_trace_printk exists but carries overhead; production debugging relies on eBPF maps and bpftool.
  • Ingress only, no conntrack — XDP handles inbound packets. For full stateful firewall behavior, you still need nftables, or you combine XDP with TC (Traffic Control) eBPF for egress.

Recommended Setup

The pragmatic starting point for most teams:

  • Use XDP for ingress filtering — blocklists, rate limiting, DDoS mitigation
  • Keep nftables for stateful rules, outbound filtering, and DNAT
  • Use XDP redirect + AF_XDP for userspace packet processing when you need it
  • Manage XDP programs with libxdp (from the xdp-tools project) rather than the raw bpf() syscall directly

Minimum kernel version: 5.10+ for stable XDP support. Kernel 6.1+ (Debian 12, Ubuntu 22.04 HWE) is the better target — it ships with multi-prog support and improved map helpers.

Implementation Guide

1. Install dependencies

# Debian 12 / Ubuntu 22.04
apt install -y clang llvm libelf-dev libbpf-dev linux-headers-$(uname -r) \
  bpftool xdp-tools iproute2 gcc make

# Verify kernel BPF support
bpftool feature | grep -E 'xdp|prog_type'

2. Write a basic XDP packet filter in eBPF C

This program drops all UDP packets on port 53 from source IPs in a blocklist — a direct counter to DNS amplification attacks:

// xdp_filter.c
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/udp.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>

// eBPF map: blocklist of source IPs (LPM trie for CIDR support)
struct {
    __uint(type, BPF_MAP_TYPE_LPM_TRIE);
    __uint(max_entries, 10000);
    __type(key, struct bpf_lpm_trie_key);  // prefixlen + data
    __type(value, __u32);
    __uint(map_flags, BPF_F_NO_PREALLOC);
} blocklist SEC(".maps");

SEC("xdp")
int xdp_filter_prog(struct xdp_md *ctx)
{
    void *data     = (void *)(long)ctx->data;
    void *data_end = (void *)(long)ctx->data_end;

    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end)
        return XDP_PASS;
    if (eth->h_proto != bpf_htons(ETH_P_IP))
        return XDP_PASS;

    struct iphdr *ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end)
        return XDP_PASS;
    if (ip->protocol != IPPROTO_UDP)
        return XDP_PASS;

    // LPM lookup on source IP
    struct {
        __u32 prefixlen;
        __u32 addr;
    } key = { .prefixlen = 32, .addr = ip->saddr };

    if (bpf_map_lookup_elem(&blocklist, &key))
        return XDP_DROP;

    return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

3. Compile and load

# Compile to BPF bytecode
clang -O2 -target bpf -c xdp_filter.c -o xdp_filter.o \
  -I/usr/include/$(uname -m)-linux-gnu

# Attach to interface (native mode preferred)
ip link set dev eth0 xdpdrv obj xdp_filter.o sec xdp

# Verify it loaded
bpftool net show dev eth0
bpftool map list

4. Manage the blocklist from userspace (Python)

Once the program is running, you can add or remove IPs without reloading anything. The data plane keeps running; you are just updating a map:

#!/usr/bin/env python3
# manage_blocklist.py — uses bpftool, no extra Python dependencies needed
import socket
import subprocess

BLOCKLIST_MAP = "/sys/fs/bpf/blocklist"  # pinned map path

def block_ip(ip_str: str):
    """Add a /32 host entry to the XDP blocklist map."""
    packed = socket.inet_aton(ip_str).hex()
    # key: prefixlen (32, little-endian 4 bytes) + addr (4 bytes)
    key_hex = "20000000" + packed  # 0x20 = 32 in LE
    value_hex = "01000000"
    subprocess.run([
        "bpftool", "map", "update", "pinned", BLOCKLIST_MAP,
        "key", "hex", *[key_hex[i:i+2] for i in range(0, len(key_hex), 2)],
        "value", "hex", *[value_hex[i:i+2] for i in range(0, len(value_hex), 2)]
    ], check=True)
    print(f"Blocked: {ip_str}")

def unblock_ip(ip_str: str):
    packed = socket.inet_aton(ip_str).hex()
    key_hex = "20000000" + packed
    subprocess.run([
        "bpftool", "map", "delete", "pinned", BLOCKLIST_MAP,
        "key", "hex", *[key_hex[i:i+2] for i in range(0, len(key_hex), 2)]
    ], check=True)
    print(f"Unblocked: {ip_str}")

if __name__ == "__main__":
    import sys
    if len(sys.argv) == 3 and sys.argv[1] == "block":
        block_ip(sys.argv[2])
    elif len(sys.argv) == 3 and sys.argv[1] == "unblock":
        unblock_ip(sys.argv[2])
# Pin the map so userspace tools can access it by path
bpftool map pin id $(bpftool map list | grep blocklist | awk '{print $1}' | tr -d ':') \
  /sys/fs/bpf/blocklist

# Block and unblock IPs on the fly — no service restart
python3 manage_blocklist.py block 198.51.100.42
python3 manage_blocklist.py unblock 198.51.100.42

5. Simple Layer 4 load balancer using XDP_TX

A production L4 LB deserves its own article. The core idea fits in a sentence: parse the destination port, pick a backend from a BPF array map using a hash of the 5-tuple, rewrite the destination IP and MAC, return XDP_TX to retransmit on the same interface. Katran (Meta’s open-source L4 LB, handling tens of millions of packets per second) and Cilium both use this approach at scale — the concept scales further than most teams will ever need.

# Check per-CPU drop/pass counters from your XDP program
bpftool map dump id <stats_map_id>

# Detach XDP program cleanly
ip link set dev eth0 xdp off

6. Persist across reboots with systemd

# /etc/systemd/system/xdp-filter.service
[Unit]
Description=XDP Packet Filter
After=network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/ip link set dev eth0 xdpdrv obj /opt/xdp/xdp_filter.o sec xdp
ExecStop=/usr/sbin/ip link set dev eth0 xdp off

[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now xdp-filter

Monitoring and Observability

Add per-CPU counters to your XDP programs from day one. A dropped packet with no counter attached is a debugging session you do not want to have at 2am during an incident.

# Watch XDP drop counters live (if you exported them as a map)
watch -n 1 'bpftool map dump name xdp_stats'

# Kernel-side XDP statistics per interface
ip -s link show dev eth0 | grep -A2 'RX\|TX'

# bpftrace one-liner: trace every XDP decision
bpftrace -e 'kprobe:xdp_do_generic_redirect { @[retval] = count(); }'

Closing Thoughts

After six months running XDP on production edge nodes, going back to pure iptables for ingress filtering is not something I would consider. The performance headroom means the same server handles traffic events that previously required emergency rate limiting or upstream nullrouting.

Raw throughput is only part of the story. What makes XDP compelling day-to-day is the combination: kernel-level performance with the operational model of a normal Linux server. Monitoring still works.

SSH still works. tcpdump still works. The eBPF map interface lets you react to threats in real time from a Python script without touching the data plane program. For teams running their own edge infrastructure, that is a genuinely different operational posture — not just a faster version of what you already had.

Share: