This weekend I finally cleaned up something that had been bothering me for a while: DNS and power handling in my cluster.
It technically worked but had a few glitches and, moreover:
It wasn’t robust!
DNS: from “it runs” to actual redundancy
I had a single Pi-hole running in my Proxmox cluster. It filtered ads, handled some local DNS records, and even did recursive resolving via its own Unbound instance. Technically fine.
But there was a structural flaw.
If that container went down, local name resolution for hammer.home.lan, logos.home.lan, gnosis.home.lan and friends was gone. The cluster was still alive, but name resolution wasn’t.
That’s annoying! and with no backup of a second DNS in my setup, a broken Pi-hole would mean that all name resolving went out of the window.
Ooopsie!
So.. the fix is. Two Pi-hole containers!
Yes Two!
Synced!
- Gravity Sync to keep blocklists and settings identical
- HA group in Proxmox, one per node
- DHCP now hands out both DNS servers
But the biggest architectural change was this:
I moved local DNS authority to OPNsense. So, Pi-hole now only filters.
OPNsense (Unbound) is the single source of truth for both home.lan and (the soon defunct) office.lan.
Static host mappings
Aliases like pihole1.home.lan, synology.home.lan, etc.
That removes the circular dependency of “DNS for the cluster depends on the cluster”.
And this makes one of my mantra’s perfect: Separation of concerns!
- Pi-hole = filtering layer
- OPNsense = authoritative resolver
- DHCP = consistent distribution
Much better!
The Synology glitch
Integrating my Synology as a NUT slave should have been easy.
It wasn’t! Holy crap Synology, how difficult can it be!
Everything worked from Linux clients right of the bat. upsc apc@192.168.1.10
BAM OUTPUT!
But DSM refused to connect.
After some digging I found the reason:
Synology has a really simple interface but that also means that it hardcodes:
- UPS name = ups
- Username = monuser
- Password = secret
Hardcoded!
I kid you not!
It is 2026!
So my UPS definition [apc] had to become [ups].
And I had to define: [monuser] password = secret upsmon slave
The moment I did that, DSM connected instantly.
Power handling: no half measures
The APC UPS now connects via USB to one Proxmox node (hammer), which runs NUT as master. The other nodes and the Synology act as slaves.
Who thought of that name. NUT.. Yeah.. NUT.. LOL!
…we are doing WHAT exactly?
And yes, depending on context:
- Nut → squirrel storage unit
- To nut → very different internet category
- Nut job → unstable person
- Nut and bolt → hardware
- Deez nuts → well… internet
And then you realize:
I just spent my weekend configuring NUT across three servers and a NAS.
Decisions
The important decision was about shutdown policy.
I don’t want: Immediate shutdown on short power blips nor a “maybe we cancel halfway” situation.
So I configured upssched:
- If on battery for 4 minutes → commit to shutdown
- If power returns before 4 minutes → cancel timer
- If shutdown starts → it completes fully
If we commit, we commit fully, no questions asked!
With about 15 minutes runtime at current load, that leaves plenty of margin for:
- Clean VM shutdown
- Ceph clean exit
- Host shutdown
- Synology shutdown
And when power returns, everything starts automatically via BIOS “Restore on AC Power”.
Glitches
And yeah, right after setting everything up my APC UPS decided to do a back flip.
It reported: “LOW BATTERY” over the USB connection to the master in the chain.
And guess what happens next.
Yes. Hammer down. Logos down. Gnosis trying to shut itself off but not fully committing. My NAS starting a shutdown, killing services, but never actually powering off.
HUH?
Turns out a single “LOW BATTERY” event was enough to trigger a forced shutdown cascade through NUT. No actual blackout. No dramatic power failure. Just a USB hiccup that briefly convinced the master it was time to end the world.
So for now I’ve disabled the automatic shutdown execution entirely. No timers. No EXEC hooks. Just raw events flowing straight into syslog.
Measure first.
Interpret later.
Then automate.
The results!
What I have now:
- DNS over two nodes
- Authoritative Unbound on OPNsense
- Pi-hole redundancy with Gravity Sync
- NUT master/slave across cluster and NAS
- Timed, controlled shutdown policy (mostly)
Nothing exotic.
Just infrastructure that behaves predictably.
Which, honestly, is the whole point of the whole thing.
Todo
Still not perfect though.
I already kicked one of the nodes a few times because it’s sticking out from under my desk. Which is great, obviously.
High Availability is wonderful.
Unless it’s physically in the way of your feet.
Turns out cable management and rack depth are also part of cluster design. Who knew.
Brain out!