Deploying a Wireless Mesh Network with B.A.T.M.A.N. Advanced (batman-adv) on Linux: Self-Healing Infrastructure for HomeLab and IoT

Networking tutorial - IT technology blog
Networking tutorial - IT technology blog

Six Months Running batman-adv in Production: What Actually Happened

I set up a batman-adv mesh across my home lab and a small IoT deployment back in late 2024. Three Raspberry Pi 4 nodes, roughly 40 connected devices, one wired gateway. The short version: it held up better than I expected. Nodes dropped, came back, rerouted — all without manual intervention. That alone made the whole experiment worthwhile.

This write-up covers what I compared before picking batman-adv, what actually works and what trips you up, the exact setup I settled on, and a step-by-step guide to replicate it on any Debian/Ubuntu system.

Mesh Networking Approaches: What’s Out There

Before committing to batman-adv, I looked at three realistic options for a Linux-native, open-source mesh setup:

802.11s (IEEE Standard Mesh)

The kernel-level 802.11s protocol is built into the Linux wireless stack. It handles path selection at the MAC layer using the Hybrid Wireless Mesh Protocol (HWMP). Setup is done with iw and wpa_supplicant. On paper, it’s the cleanest choice — fully standardized, no extra kernel module needed.

Real-world use is messier. I had three Wi-Fi adapters — ath9k, mt76, and rtl8192eu — and only two of them negotiated mesh links reliably. The third would associate but never actually pass traffic. Debugging that cost me more time than building the batman-adv setup from scratch.

batman-adv (B.A.T.M.A.N. Advanced)

batman-adv operates at Layer 2 — it appears as a virtual Ethernet interface (bat0) on top of your physical wireless interfaces. Routing decisions happen inside the kernel module based on link quality metrics (Transmit Quality, TQ). Nodes broadcast OGMs (Originator Messages) to announce themselves and measure path quality to every other reachable node.

Unlike 802.11s, batman-adv is transport-agnostic. Run it over Wi-Fi, Ethernet, or both at once. That flexibility was critical for my setup where some IoT nodes are wired and others wireless — the mesh treats them all the same.

OLSR / Babel

OLSR (Optimized Link State Routing) and Babel are Layer 3 routing protocols. They’re excellent for larger deployments where you need IP-level routing control and integration with BGP or OSPF. For a home lab or IoT mesh, though, where the goal is transparent Layer 2 bridging — every device on the same subnet — that’s added complexity you don’t need. More moving parts, same result.

batman-adv: Honest Pros and Cons

What Works Well

  • Self-healing routing: When a node goes down, traffic reroutes fast — typically 3–8 seconds in my tests. I’ve pulled power from nodes mid-transfer and watched the session recover cleanly.
  • Transport agnostic: The same bat0 interface works over Wi-Fi, Ethernet, or both simultaneously. Mixed wired/wireless nodes just work.
  • Layer 2 transparency: Every node shares the same broadcast domain. DHCP, mDNS, ARP — all function normally without extra tunneling or routing config.
  • Kernel module maturity: batman-adv has been in the mainline kernel since 3.9. On modern Ubuntu/Debian, it’s a modprobe away.
  • Low overhead on small hardware: Raspberry Pi 3B nodes running batman-adv with a USB Wi-Fi adapter used under 8% CPU during normal mesh operation.

Where It Falls Short

  • No WPA3 mesh encryption natively: batman-adv itself doesn’t encrypt traffic. Handle link-layer encryption separately — either WPA2 on the underlying wireless interface or a VPN overlay like WireGuard.
  • Debugging takes a learning curve: The batctl tool is excellent, but understanding why a specific path is chosen requires reading TQ values and OGM tables. Not intuitive the first few times.
  • Broadcast/multicast overhead: In large meshes (20+ nodes), broadcast traffic can get noisy. For small setups this is a non-issue, but it’s something to track as you scale.
  • Driver compatibility: Not every Wi-Fi adapter supports the monitor/ad-hoc modes batman-adv relies on. ath9k and mt76 drivers have consistent support; many cheap USB adapters do not.

Recommended Setup

After a few weeks of iteration, here’s what I settled on — it’s been running without issues since:

  • Hardware: Raspberry Pi 4 nodes (×3), TP-Link TL-WN722N USB adapters (ath9k_htc driver), one node with a wired uplink acting as the gateway.
  • OS: Ubuntu 22.04 LTS (all nodes identical image)
  • Interface mode: Ad-hoc (IBSS) on 5 GHz channel 36, SSID batman-mesh
  • batman-adv version: 2021.4 (kernel module) + batctl matching version
  • Encryption: WireGuard overlay for inter-node privacy
  • DHCP: Single dnsmasq instance on the gateway node, serving the entire bat0 broadcast domain

Implementation Guide

Step 1: Install batman-adv and batctl

# On every mesh node
sudo apt update
sudo apt install batctl bridge-utils wireless-tools wpasupplicant

# Load the batman-adv kernel module
sudo modprobe batman-adv

# Make it load at boot
echo 'batman-adv' | sudo tee -a /etc/modules

Step 2: Configure the Wireless Interface for Ad-Hoc Mode

Replace wlan0 with your actual interface name. Check with ip link first.

# Bring interface down before changing mode
sudo ip link set wlan0 down

# Set ad-hoc (IBSS) mode
sudo iwconfig wlan0 mode ad-hoc
sudo iwconfig wlan0 essid batman-mesh
sudo iwconfig wlan0 ap any
sudo iwconfig wlan0 channel 36

# Bring interface back up (no IP — batman-adv handles addressing via bat0)
sudo ip link set wlan0 up

Step 3: Add the Interface to batman-adv

# Add wlan0 as a batman-adv slave interface
sudo batctl if add wlan0

# If you have a wired interface to include too:
sudo batctl if add eth1

# Verify
sudo batctl if

Step 4: Bring Up the bat0 Interface

# Bring up the virtual mesh interface
sudo ip link set bat0 up

# On gateway node: assign a static IP
sudo ip addr add 10.10.10.1/24 dev bat0

# On all other nodes: use DHCP (once gateway has dnsmasq running)
sudo dhclient bat0

Step 5: Automate with systemd (Persistent Config)

Manual setup is fine for testing. For production, wire it up with a boot script and a systemd unit. Create /etc/systemd/system/batman-mesh.service:

[Unit]
Description=batman-adv Mesh Network Setup
After=network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/batman-mesh-up.sh

[Install]
WantedBy=multi-user.target

Then create /usr/local/bin/batman-mesh-up.sh:

#!/bin/bash
set -e

INTERFACE=wlan0
MESH_SSID=batman-mesh
CHANNEL=36

modprobe batman-adv

ip link set $INTERFACE down
iwconfig $INTERFACE mode ad-hoc
iwconfig $INTERFACE essid $MESH_SSID
iwconfig $INTERFACE channel $CHANNEL
ip link set $INTERFACE up

batctl if add $INTERFACE
ip link set bat0 up
sudo chmod +x /usr/local/bin/batman-mesh-up.sh
sudo systemctl enable batman-mesh
sudo systemctl start batman-mesh

Step 6: Monitor Mesh Status with batctl

batctl gives you actual visibility into what the mesh is doing. Once at least two nodes are up:

# List all reachable nodes (originators) and their TQ scores (0-255, higher = better)
sudo batctl o

# Show active mesh interfaces
sudo batctl if

# Real-time TQ monitoring (refresh every 1 second)
sudo batctl o -w 1

# Ping a node by MAC address (Layer 2 ping, not IP)
sudo batctl ping aa:bb:cc:dd:ee:ff

# Traceroute through the mesh
sudo batctl traceroute aa:bb:cc:dd:ee:ff

Sample output from batctl o on a three-node mesh:

[B.A.T.M.A.N. adv 2021.4, MainIF/MAC: wlan0/dc:a6:32:xx:xx:xx, mesh: bat0]
Originator         last-seen (#/255) Nexthop           [outif]
dc:a6:32:aa:bb:cc    0.132s (218) dc:a6:32:aa:bb:cc [wlan0]
e4:5f:01:11:22:33    0.420s (196) dc:a6:32:aa:bb:cc [wlan0]
e4:5f:01:44:55:66    0.890s (154) dc:a6:32:aa:bb:cc [wlan0]

TQ of 200+ means a solid direct link. Below 100, the path is going through multiple hops or the link quality has dropped. This is the first number I check when a node seems sluggish.

Step 7: Configure dnsmasq on the Gateway Node

sudo apt install dnsmasq

# /etc/dnsmasq.d/batman-mesh.conf
interface=bat0
dhcp-range=10.10.10.100,10.10.10.200,12h
dhcp-option=3,10.10.10.1    # Default gateway
dhcp-option=6,1.1.1.1       # DNS
sudo systemctl restart dnsmasq

A Few Things I’d Do Differently Now

Six months in, the setup is stable — but two things I’d change if starting over. First: WireGuard between nodes from day one, not bolted on later. Encrypting the mesh overlay is straightforward, and scripting the key exchange with Ansible across all nodes takes under an hour. Second: 5 GHz from the start. My initial 2.4 GHz setup had interference problems that vanished the moment I moved to channel 36.

One more thing worth calling out for IoT specifically. batman-adv’s Layer 2 transparency means Zigbee coordinators, MQTT brokers, and Home Assistant all find each other via mDNS across mesh nodes — zero extra routing config required. Getting that same behavior with a Layer 3 approach means wrestling with mDNS proxies or avahi forwarding. Avoiding that headache alone justified the choice.

Share: