The Boot That Would Not End
Six months ago, I had a problem I kept ignoring. My production Ubuntu 22.04 server — 4GB RAM, running a handful of Docker containers and an Nginx reverse proxy — was taking over 40 seconds to boot. For planned maintenance windows, annoying. For unexpected reboots after kernel updates, it started feeling genuinely risky.
I assumed it was just “how Linux is” on aging hardware. Then a colleague asked whether I had ever actually looked at what was taking so long. I hadn’t. That one question changed how I think about server maintenance.
Here is what I found, what caused it, and the exact steps I used to bring that 40-second boot down to 18 seconds — without disabling anything critical.
Reading the Boot Timeline
Start with systemd-analyze. It ships with systemd, so nothing to install. Run the summary first:
systemd-analyze
On my server, this returned:
Startup finished in 3.102s (kernel) + 37.841s (userspace) = 40.943s
multi-user.target reached after 37.729s in userspace
The kernel portion is almost always under 5 seconds. Everything interesting happens in userspace — the chain of systemd units starting after kernel handoff. That 37-second number told me something was badly wrong.
Drill into per-service timing with:
systemd-analyze blame
This lists every service sorted by startup duration. My top results:
34.201s apt-daily-upgrade.service
8.320s snapd.service
4.118s NetworkManager-wait-online.service
2.903s mysql.service
1.440s cloud-init.service
0.891s docker.service
0.703s ssh.service
One line told the whole story: apt-daily-upgrade.service was burning 34 seconds on its own.
Visualizing the Dependency Chain
Raw numbers miss the dependency picture — a service can look slow when it is actually blocked waiting on something else entirely. The SVG plot fills that gap:
systemd-analyze plot > boot-analysis.svg
Open it in any browser. What you get is a Gantt-style chart of every unit — start time, duration, and what each unit was waiting for. On my server, I could see that NetworkManager-wait-online.service was not actually slow. It was sitting idle, waiting for apt-daily-upgrade.service to release a lock.
For a faster text-based dependency trace on a specific unit:
systemd-analyze critical-chain multi-user.target
This finds the longest dependency chain leading to your default target. Mine showed:
multi-user.target @37.729s
└─apt-daily-upgrade.service @3.528s +34.201s
└─apt-daily.service @3.411s +0.089s
└─network-online.target @3.380s +0.031s
The chain was clear. Fix apt-daily-upgrade.service, and everything downstream speeds up.
Root Cause: Three Categories of Slow Units
After auditing a dozen servers over the past six months, the culprits consistently fall into three buckets:
1. Services That Should Not Run at Boot
Automatic package upgrades (apt-daily-upgrade.service), snap refresh (snapd.service), and cloud-init on bare-metal servers all belong here. They serve a real purpose — just not at boot time while your server is trying to come online.
2. Services Waiting for the Network
NetworkManager-wait-online.service is the classic trap. It blocks the entire boot until the network is fully connected, but most services only need the network stack to be up — not necessarily routing to the internet. On my server, this was adding 4 unnecessary seconds.
3. Misconfigured or Orphaned Services
A previous developer had installed a monitoring agent, then removed it — but left the systemd unit file behind. Every boot, it tried to start, failed, retried, and eventually timed out. Silently eating 6 seconds, every single time.
Fixing Each Category
Delaying Automatic Updates
The right fix for apt-daily-upgrade.service is to delay it, not kill it. Security updates still matter — you just do not want them running while users are waiting for the server to come back up.
sudo systemctl edit apt-daily-upgrade.service
Add this override to push it 10 minutes after boot:
[Unit]
After=multi-user.target
[Service]
ExecStartPre=/bin/sleep 600
Or use the timer to schedule it at a fixed hour instead:
sudo systemctl edit apt-daily-upgrade.timer
[Timer]
OnBootSec=15min
OnCalendar=
OnCalendar=03:00
On my server, delaying the timer to 3 AM recovered 32 seconds of boot time on its own. Updates still run — just when nobody is watching.
Fixing NetworkManager-wait-online
Check whether anything critical actually depends on a fully online network at boot:
systemd-analyze dot --require | grep network-online
If nothing in your output looks essential, it is safe to disable the wait:
sudo systemctl disable NetworkManager-wait-online.service
If a service genuinely needs internet connectivity at startup — say, pulling configuration from Consul or a remote secrets store — keep it. But first ask whether that dependency is truly necessary, or just an unchecked default.
Removing Orphaned Units
Find what is failing silently:
systemctl list-units --state=failed
For each failed unit tied to software you no longer run:
sudo systemctl disable --now old-monitoring-agent.service
sudo rm /etc/systemd/system/old-monitoring-agent.service
sudo systemctl daemon-reload
Handling snapd
Not using snap packages? Removing snapd entirely is cleaner than disabling it:
# First, list installed snaps
snap list
# Remove each snap you do not need
sudo snap remove --purge snap-store
sudo snap remove --purge core20
# Then remove snapd itself
sudo apt remove --purge snapd
sudo apt-mark hold snapd
The apt-mark hold stops snapd from sneaking back in via other package upgrades. On servers where snap is unused, this alone recovers 6–10 seconds.
Masking vs Disabling vs Delaying
These three options get mixed up constantly, and choosing wrong causes real problems:
- Disable — removes the service from automatic startup, but it can still be started manually or pulled in as a dependency by another unit.
- Mask — completely blocks the unit, even if another service tries to pull it in. Use this only when you are certain the unit should never run on this machine.
- Delay (override) — keeps the service running but moves when it runs. Best choice for maintenance tasks that have real value but wrong timing.
# Check what breaks if you disable a unit
systemctl list-dependencies --reverse apt-daily-upgrade.service
# Mask a unit you never want running
sudo systemctl mask iscsid.service
# Undo masking
sudo systemctl unmask iscsid.service
Always run the reverse dependency check before disabling anything. I skipped it once, disabled a service three others depended on, and spent an hour debugging broken Docker networking on the next boot.
Verifying Your Changes
After each change, reboot and measure:
systemd-analyze
systemd-analyze blame | head -20
Compare against your baseline. This measure-change-measure loop is exactly how I got from 40 seconds to 18 — three rounds of targeted fixes, not one big sweep.
Final numbers after all changes:
Startup finished in 3.091s (kernel) + 14.802s (userspace) = 17.893s
multi-user.target reached after 14.690s in userspace
The top entries in systemd-analyze blame are now legitimate — mysql, docker, nginx. Things that genuinely need time to initialize. Nothing wasted.
What Actually Matters
Boot time is just the metric. What you are really doing with systemd-analyze is making visible something most sysadmins never look at: exactly what your server is doing during those first 40 seconds, and why.
Slow boots are almost always accumulated cruft — services installed and forgotten, defaults never reviewed, desktop-oriented dependencies running on a headless server. A 30-minute boot audit every few months pays dividends: faster recovery after incidents, fewer mysterious startup failures, and a clear picture of what is actually running on your machines.
Four commands to internalize:
systemd-analyze— total timesystemd-analyze blame— per-unit breakdownsystemd-analyze critical-chain— longest dependency pathsystemd-analyze plot— visual timeline
Start with blame. Find your top offender. Understand whether it is slow by itself or waiting on something else. Then choose: delay, disable, or mask. Repeat until the numbers look reasonable. It is not complicated — it just requires actually looking.

