Proxmox High Availability Setup for Automatic VM Failover

Set up Proxmox HA Manager to automatically restart VMs after a node failure. Covers fencing requirements, HA group config, and live failover testing on PVE 9.1.

Proxmox Pulse Proxmox Pulse
9 min read
proxmox-ha high-availability vm-failover fencing cluster
Three server nodes in a cluster with one failing and others taking over its workloads.

Proxmox High Availability Manager restarts your VMs automatically on a surviving node within about 60-90 seconds of detecting a node failure — no manual intervention, no SSH session at 3am. By the end of this guide, you'll have a working HA cluster with properly configured fencing, HA groups, and a tested failover. I'm running this on a three-node Proxmox VE 9.1 cluster with Ceph shared storage, but the procedure is identical for iSCSI or NFS-backed clusters.

Key Takeaways

  • 3 nodes minimum: Two-node clusters can't maintain quorum after a single failure — HA needs a majority vote to proceed.
  • Fencing is mandatory: Without a working watchdog or IPMI fence agent, Proxmox HA will refuse to restart VMs to avoid split-brain data corruption.
  • Shared storage required: VMs must live on storage accessible from all nodes — Ceph, iSCSI, NFS, or shared ZFS over FC.
  • Recovery takes 60-90 seconds: The delay is deliberate — Proxmox waits for fencing confirmation before restarting anything.
  • Test with a hard power-off: A graceful shutdown doesn't replicate a real failure scenario.

How Proxmox HA Actually Works

Proxmox HA runs on two daemons: pve-ha-lrm (Local Resource Manager, one per node) and pve-ha-crm (Cluster Resource Manager, one elected leader per cluster). The CRM watches resource states; the LRM executes commands on its local node.

When a node goes down, the sequence is:

  1. Corosync marks the node unreachable after missed heartbeats.
  2. The CRM waits for fencing confirmation — either a watchdog reset or an IPMI power-cycle that proves the failed node is genuinely off.
  3. Once fenced, the CRM issues relocate or restart commands for all HA-managed VMs.
  4. The LRM on a surviving node starts each VM from the shared storage pool.

Step 2 is where most misconfigured HA setups stall. Without fencing, the CRM correctly refuses to restart VMs — the original node might still be running and holding disk locks, and starting a second instance would corrupt the VM's filesystem.

Why the Three-Node Minimum Matters

Corosync requires a majority (quorum) to operate. With two nodes, losing one leaves you at exactly 50% — no majority, cluster services halt. With three nodes, losing one leaves you at 66% — quorum maintained, HA proceeds normally.

You can work around a two-node cluster with a lightweight qdevice (a tie-breaker service running on something like a Raspberry Pi), but three nodes is the cleaner path. If you're starting from scratch, the guide on building a private Proxmox cloud at home walks through the full multi-node cluster setup prerequisites.

What You Need Before Enabling HA

Check all of these before touching the HA configuration panel. Missing any one of them produces an HA setup that looks active but silently fails when you actually need it.

Cluster:

  • Three or more PVE 9.1 nodes in the same cluster
  • Corosync heartbeat latency under 5ms — use a dedicated cluster NIC if you can
  • Synchronized time on all nodes: run chronyc tracking and confirm offset under 100ms

Storage:

  • Target VMs must use shared storage: Ceph RBD, iSCSI, NFS, or Fibre Channel
  • Local storage (local-lvm, local-zfs) silently disqualifies a VM from HA eligibility

Fencing:

  • A hardware watchdog device at /dev/watchdog or /dev/watchdog0
  • Or IPMI/iDRAC/iLO configured as a fence agent with tested, working credentials

Verify your watchdog device is present:

ls /dev/watchdog*

If nothing appears, load the software fallback as a stopgap (acceptable for testing, not for production):

modprobe softdog
echo "softdog" >> /etc/modules

Configure the Hardware Watchdog with watchdog-mux

Proxmox ships watchdog-mux, a daemon that multiplexes the watchdog device so multiple HA processes can share it safely. It must be running on every cluster node.

Check and enable it:

systemctl status watchdog-mux
systemctl enable --now watchdog-mux

Verify the LRM connected to it:

journalctl -u pve-ha-lrm --since "5 minutes ago" | grep -i watchdog

You should see a line confirming the LRM opened /run/watchdog-mux.sock. Errors here mean fencing is broken and recovery will hang indefinitely.

The watchdog timeout is configurable:

# /etc/default/pve-ha-manager
HA_WATCHDOG_TIMEOUT=60

The 60-second default is appropriate for most setups. Shorter values increase sensitivity to transient network blips; longer values delay recovery.

Setting Up IPMI Fencing for Bare-Metal Nodes

For bare-metal servers with IPMI — which covers most enterprise hardware and many homelab boards — IPMI fencing is more reliable than a software watchdog alone. It gives you hard power control even when the OS is completely unresponsive.

Install the fence agents package on all nodes:

apt install fence-agents

Test your BMC credentials before configuring anything:

fence_ipmilan -a 192.168.1.52 -l admin -p yourpassword -o status

Expected output: Status: ON. If this fails, fix IPMI access first — there is no point configuring HA fencing around a broken BMC connection. While you're securing IPMI access, make sure it's restricted to your management VLAN; the Proxmox hardening guide has practical firewall rules for exactly this scenario.

Configure the fence agent per-node via the Proxmox API:

pvesh set /nodes/pve2/config \
  --fence-plugin ipmilan \
  --fence-ipmi-ip 192.168.1.52 \
  --fence-ipmi-user admin \
  --fence-ipmi-password yourpassword

How to Create HA Groups and Enroll VMs

Create an HA Group

HA groups control which nodes are eligible to run a set of VMs and in what priority order. Navigate to Datacenter → HA → Groups → Add, or use the API:

pvesh create /cluster/ha/groups \
  --group critical-vms \
  --nodes "pve1:3,pve2:2,pve3:1"

The trailing number is priority — higher wins. Equal priority means Proxmox picks the surviving node arbitrarily.

Option Effect
restricted VMs only ever run on nodes listed in this group
nofailback VMs don't migrate back when the preferred node recovers
Node priority Determines which surviving node receives the VM first

Start with one group containing all nodes at equal priority. Tune after watching real failovers.

Add VMs and Containers to the HA Group

In the web UI: select a VM, click More → Manage HA. Or with the CLI:

pvesh create /cluster/ha/resources \
  --sid vm:101 \
  --group critical-vms \
  --state started \
  --max_restart 3 \
  --max_relocate 3
  • --state started: the desired state HA will actively maintain
  • --max_restart: restart attempts on the current node before escalating to relocation
  • --max_relocate: relocation attempts across nodes before marking the resource failed

LXC containers use --sid ct:102. Confirm all enrolled resources:

pvesh get /cluster/ha/resources

Before adding a VM, always verify its disk is on shared storage:

qm config 101 | grep -E "^(scsi|virtio|ide|sata)"
# You want output like:
# scsi0: ceph-pool:vm-101-disk-0,size=32G
# Not:
# scsi0: local-lvm:vm-101-disk-0,size=32G

A VM on local-lvm appears enrolled and healthy in the HA panel, then silently fails to recover when you need it most. There is no warning at enrollment time.

How to Test HA Failover the Right Way

Do not use systemctl poweroff to test failover. A clean shutdown lets the node announce its departure to the cluster, which changes how the CRM handles the transition — it's not a realistic crash simulation.

Use a hard power-off instead. From a machine with IPMI access:

ipmitool -H 192.168.1.51 -U admin -P yourpassword chassis power off

Alternatively, on a dedicated test node, force a kernel panic:

# WARNING: This immediately crashes the system. Test nodes only.
echo c > /proc/sysrq-trigger

Watch recovery in real time from a surviving node:

watch -n2 "pvesh get /cluster/ha/status/current"

Expected timeline:

  • 0-30s: Corosync detects the absent node, CRM initiates fencing
  • 30-60s: Watchdog resets the failed node, or IPMI confirms power-off
  • 60-90s: CRM issues relocation commands; LRM brings VMs online on the surviving node

If the status stays in recovery past 90 seconds, the CRM is waiting on a fencing confirmation that never arrived:

journalctl -u pve-ha-crm -f

The log will tell you exactly which fence operation stalled. It's almost always either watchdog-mux not running on every node after a reboot, or stale IPMI credentials.

Common HA Mistakes to Avoid

VM on local storage. Enrolled in HA, appears healthy, fails silently on recovery. Verify storage before adding any resource.

Skipping the IPMI fence test. fence_ipmilan ... -o status takes 10 seconds to run. Skipping it takes hours to debug when HA stalls at 3am.

Two nodes without a qdevice. One failure, no quorum, HA freezes. Either add a third node or deploy corosync-qnetd on a lightweight device before relying on HA for anything real.

NTP drift. Corosync is sensitive to clock skew. Offset over a few hundred milliseconds triggers spurious node-unreachable events. Run timedatectl status on each node and confirm NTP is active and synced.

max_restart set to 1. A VM that needs 45 seconds to complete its startup health check will relocate unnecessarily on the first failed check. Set max_restart to at least 3 for non-trivial workloads.

No N-1 capacity planning. HA restarts VMs, but if surviving nodes are already at 90% RAM utilization, the VMs fail to start anyway. For a three-node cluster with 128 GB per node, plan as though any single node may be absent — cap total allocated RAM at 256 GB.

Conclusion

With watchdog-mux confirmed running, shared storage in place, and VMs enrolled in HA groups, Proxmox automatically recovers critical workloads within 90 seconds of a node failure. Fencing isn't bureaucratic overhead — it's the safety mechanism that makes corruption-free restarts possible. Run the hard power-off test before you declare success.

Once HA is protecting your VMs at the infrastructure level, add point-in-time recovery at the data level: schedule regular backups via Proxmox Backup Server so that even a storage failure has a fallback beyond the last snapshot.

Share
Proxmox Pulse

Written by

Proxmox Pulse

Sysadmin-driven guides for getting the most out of Proxmox VE in production and homelab environments.

Related Articles

View all →