Context & Why: Ending the Dashboard Roulette
Six months ago, our infrastructure monitoring was a fragmented mess. We were juggling Zabbix for Linux servers, an aging MRTG instance for Cisco switches, and a collection of fragile Python scripts for our edge routers. Whenever a service blinked, my team spent the first ten minutes playing ‘dashboard roulette’ just to find the root cause. We needed one tool to rule them all—something that handled SNMP and agent-based metrics without the manual overhead of legacy systems.
That is when we moved the entire stack to Checkmk. After half a year in production, the difference is night and day. Checkmk is built on the Open Monitoring Distribution (OMD). This bundles the core engine, a modern web UI, and graphing tools into a single package. For any sysadmin tired of 3 AM emergency calls, mastering OMD is a career game-changer. It provides the visibility needed to catch a failing power supply before it takes down a rack.
We opted for the Checkmk Raw Edition (CRE). It is fully open-source and currently manages over 250 of our nodes with ease. The real strength here is the rule-based configuration. Instead of manually clicking through every host to add a CPU check, you write one rule for a folder. Checkmk does the rest, saving us roughly five hours of configuration work every month.
Installation: Building a Stable Foundation on Ubuntu
I recommend a dedicated Ubuntu 24.04 LTS instance for the best experience. The installation is surprisingly streamlined because the package handles almost every dependency for you.
1. Preparing the System
Start with a clean slate. You will need wget to grab the binary and gdebi to ensure the local installation doesn’t choke on missing libraries.
sudo apt update && sudo apt upgrade -y
sudo apt install wget bash-completion gdebi-core -y
2. Downloading and Installing Checkmk
At the time of writing, version 2.3.0 is the stable choice. I use gdebi because it automatically pulls in the specific Apache and Python modules required by the OMD environment.
# Grab the latest Raw Edition
wget https://download.checkmk.com/checkmk/2.3.0p1/check-mk-raw-2.3.0p1_0.jammy_amd64.deb
# Install and resolve dependencies
sudo gdebi check-mk-raw-2.3.0p1_0.jammy_amd64.deb
3. Spinning Up Your First Monitoring Site
Checkmk uses ‘sites’ to keep environments isolated. You can run a production site and a test site on the same hardware. We named ours itfromzero.
# Create the instance
sudo omd create itfromzero
# Fire up the services
sudo omd start itfromzero
Note the generated admin password. You can now log in at http://your-server-ip/itfromzero. The first login is always a ‘eureka’ moment for teams used to older, clunkier interfaces.
Configuration: Taming Servers and Network Gear
With the engine running, we divide our workflow into two paths: lightweight agents for compute and SNMP for the network.
Monitoring Linux Servers with the Checkmk Agent
For Linux, forget SNMP. Checkmk uses a dedicated agent on port 6556. It is incredibly efficient. While SNMP can be chatty and CPU-intensive, the agent provides deep metrics—like per-partition IOPS and specific process states—with negligible overhead.
Download the .deb from your Checkmk ‘Setup’ menu and install it on your target servers. No local configuration is needed.
# On your target Linux VM
sudo gdebi check-mk-agent_2.3.0p1-1_all.deb
In the web UI, add the host’s IP and select “API integrations if configured, else Checkmk agent.” Once you click ‘Service Discovery,’ Checkmk will automatically map out every logical volume and network interface on that box.
Monitoring Switches and Routers via SNMP
Checkmk handles networking gear with a level of detail that puts most open-source tools to shame. Whether it’s a 48-port Cisco Catalyst or a small MikroTik router, the workflow is identical. We recommend SNMPv3 for production, but a standard community string works for initial testing.
To add a switch:
- Go to Setup > Hosts > Add host.
- Input your hostname (e.g., Core-Switch-01).
- Check “SNMP” under Monitoring agents and pick your version.
- Enter your community string in SNMP credentials.
- Save and trigger a service discovery.
The system will instantly pull port states, chassis temperatures, and even power supply status. The automatic traffic graphs are a lifesaver for spotting bandwidth hogs in real-time.
Verification: The 6-Month Verdict
The ‘Main Dashboard’ is now my team’s morning cockpit. We have integrated it with Telegram. If a core router drops a BGP peer, I get a notification on my phone before the first user ticket is even created.
Why Service Discovery Matters
The ‘Full Scan’ is the secret sauce. If you plug a new fiber SFP into a switch or add a virtual disk to a server, Checkmk notices the change during its next poll. It flags these as ‘Vanished’ or ‘New’ services. This ensures your monitoring stays accurate as your hardware evolves.
Optimizing for Scale
As we scaled past 200 devices, we noticed the UI slowing down. By default, Checkmk polls every 60 seconds. For non-critical dev servers, we changed this to 5 minutes. This simple tweak dropped our monitoring server’s CPU load from 65% to a steady 20%. You can find this under Setup > Services > Normal check interval.
Conclusion
Moving to Checkmk is the best infrastructure decision I have made this year. It replaced a fragmented mess with a professional, unified platform that treats my network gear and servers with equal importance. It is simple to install, the agent is invisible to system performance, and the SNMP support is top-tier. If you want to stop guessing about your network health, start a trial in your lab today. The visibility is worth every minute of the setup.

