How to Set Up a Proxmox Cluster: Complete Two-Node Guide

A clustered Proxmox setup lets you run virtual machines that survive node failures without manual intervention — no downtime when one server dies, automatic live migration of VMs to surviving nodes, and shared storage accessible by every machine in the cluster. This guide walks through installing two or more nodes, wiring them together into a functional cluster with shared storage, and verifying everything works before you start moving workloads over.

Key Takeaways

Time sync: NTP drift between nodes breaks quorum; verify chrony is running on every node before joining.
Network choice: Use dedicated data interfaces for inter-node traffic to avoid VM migration bottlenecks.
Storage first: Configure shared storage (Ceph or NFS) before adding your third node, which simplifies the cluster topology.
Quorum math: A two-node cluster needs a quorum witness; without it you get split-brain scenarios during network hiccups.
Verify early: Run pvecm status and corosync-cfgtool -s after every join to catch issues before they compound.

What Makes Proxmox Clustering Different from a Single Node?

A single Proxmox node handles storage, networking, and compute independently. A cluster adds two things: shared state (via Corosync) that lets nodes communicate about resource ownership, and the ability for virtual machines to live-migrate between nodes without shutting down. The first is automatic once you join nodes; the second requires configuration — specifically a network path from each node to your storage back-end.

The tradeoff here is complexity versus resilience. A two-node cluster with shared storage gives you HA VMs and live migration, but it also means every failure domain (network switch, power supply, disk) now matters twice as much. Most homelab operators find this worth it once they have more than one VM running simultaneously — see Build a Private Cloud at Home for context on when clustering makes sense in smaller setups.

How to Prepare Your Nodes Before You Start

The most common reason clusters fail during or shortly after setup is poor pre-flight configuration, not the join process itself. Three things matter most: network layout, time synchronization, and storage reachability.

Network Requirements

Each node needs at least two IP addresses — one for management (the IP you'll use to access the web UI) and one dedicated cluster interconnect if possible. The interconnect handles Corosync heartbeats, live migration traffic, and replication streams. If your nodes share a single 1 GbE link for both management and data, everything still works; just expect slower migrations during heavy I/O.

Create the bridge on each node before joining:

cat > /etc/network/interfaces.d/50-pve.cfg << 'EOF'
auto vmbr0
iface vmbr0 inet static
    address 192.168.1.10/24
    gateway 192.168.1.1
    bridge-ports eth0
    bridge-stp off
    bridge-fd 0

auto vmbr1
iface vmbr1 inet static
    address 10.10.0.1/24
    bridge-ports eth1
    bridge-stp off
    bridge-fd 0
EOF

Adjust the address and bridge-ports values to match your hardware. The second interface (eth1) is optional but recommended if you have a separate NIC or port for cluster traffic.

Time Synchronization

Corosync uses timestamps to detect failed nodes. If two nodes disagree by more than a few seconds, they'll start fencing each other — which can cause live migration failures and even split-brain scenarios where both nodes think they're the primary.

Verify NTP on every node before proceeding:

chronyc tracking | grep "Leap status"
systemctl is-enabled chrony
journalctl -u chronyd --no-pager -n 10

If chronyc reports a leap status of "Normal" and the system clock offset stays under 50 ms, you're good. On Proxmox 8.x, chrony is enabled by default; on earlier releases it was ntp. Run systemctl enable chrony --now if needed.

The Installation Sequence That Actually Works

The official documentation lists the join command, but in practice there's a sequence that avoids common pitfalls — particularly around SSH keys and storage plugins loading after the first node boots.

Installing the First Node

Install Proxmox VE on your first node using any of the standard methods (ISO install, proxmox-install, or PXE). After booting into the fresh system:

# Verify the installation succeeded
pveversion -v

# Set the hostname if it isn't correct
hostnamectl set-hostname proxmox-node1
echo "192.168.1.10 $(hostname)" >> /etc/hosts

# Configure the cluster network on the first node
pvecm create mycluster --link0=eth1:10.10.0.0/24

The pvecm create command initializes Corosync and writes /etc/pve/cluster.conf. The --link0 flag is optional — if you skip it, Proxmox auto-detects the interconnect by examining your network interfaces. This works fine for homelabs but can produce surprising results on multi-NIC servers with complex routing tables.

Joining Additional Nodes

On each additional node (after installing a fresh copy of Proxmox VE), run:

pvecm add 192.168.1.10

This copies the cluster configuration to the new node, installs required packages (corosync, pacemaker), and starts all services. The command prompts for root's password on the first node; if you haven't set a root password yet (Proxmox creates one during install), use the same value.

After joining, verify:

pvecm status
corosync-cfgtool -s
ls /etc/pve/

The last command should show cluster.conf, nodes/, and any storage definitions you've added — all replicated across nodes via corosync's quorum layer. If /etc/pve/ shows files on the new node, the join succeeded.

Setting Up Cluster Storage and Networking

Once your nodes are joined, configure shared storage before adding workloads. Without it, live migration won't have a destination for VM disk images.

Adding Shared Storage

Ceph is Proxmox's native clustered storage engine and works well even on modest hardware. For homelabs without Ceph expertise, NFS or iSCSI are simpler alternatives — see Optimizing ZFS Pools in Proxmox VE if you're using local ZFS pools as your primary storage instead.

For a quick start with Ceph:

# On the first node, add the ceph cluster
pveceph init --network 10.10.0.0/24 --ip_version ipv4

# Create OSDs on all nodes (assuming /dev/sdb is available)
for node in proxmox-node1 proxmox-node2; do
    pvecm ssh $node "pveceph osd create /dev/sdb"
done

# Add the Ceph pool as Proxmox storage
pvesm add ceph mycluster --mon 10.10.0.1:6789 \
    --pool rbd --content images,rootdir

For NFS instead (simpler setup):

# On the node running your NFS server, export a directory
echo "/srv/proxmox *(rw,sync,no_root_squash)" >> /etc/exports
exportfs -ra

# On each Proxmox node, add it as storage
pvesm add nfs prox-nfs --server 192.168.1.50 \
    --path /srv/proxmox --content images,rootdir

Configuring the Cluster Network

Live migration traffic flows over a dedicated network interface by default — specifically vmbr1 if you configured one during setup. Verify that each node can reach every other node on this bridge:

for i in 2 3; do
    ping -c 3 10.10.0.$i
done

If migration fails later, the first place to look is network connectivity between nodes on the data interface. Also check that your firewall allows port 8006 (PVE API) and port range 49152-49213 for live migration traffic:

iptables -L -n | grep 49152

How to Verify Your Cluster Is Healthy

Before moving any VMs, run through this checklist. Each step catches a different class of failure mode.

Step 1 — Quorum and Corosync:

pvecm status
corosync-cfgtool -s

You should see all nodes listed with qdevice (if configured) or an even quorum count. If any node shows as UNREACHABLE, check its network path back to the first node.

Step 2 — Storage reachability:

pvesm status
df -h /mnt/pve/<your-storage>

If storage appears but is empty, verify that all nodes can mount it simultaneously without conflicts. NFS exports with no_root_squash are the easiest to debug; Ceph pools require checking OSD health with ceph osd tree.

Step 3 — Live migration test:

Create a small VM on node one and migrate it:

qm migrate <vmid> proxmox-node2 --online

Watch for errors in /var/log/syslog during the operation. A typical live migration of a 50 GB VM over Gigabit Ethernet takes about two minutes; if you see Migration failed (code -1) with no other messages, it's almost always a storage path issue — verify that node two can read and write to your shared storage at /etc/pve/storage.cfg.

Single Node vs. Two-Node Cluster: When It Matters

Not every workload benefits from clustering. Here's where the difference shows up in practice:

Aspect	Single Node	Two-Node Cluster
VM live migration	N/A (within same node)	Yes, between nodes
HA on failure	Manual restart	Automatic failover (with qdevice)
Storage reachability	Local only	Shared across all nodes
Quorum risk	None	Split-brain without witness
Complexity cost	Low	Moderate — more to monitor

For homelab operators, the sweet spot is usually two nodes with shared storage and a qdevice (often running on a third Raspberry Pi or even one of the cluster nodes). The qdevice prevents split-brain when only one node loses network connectivity. Without it, both nodes may continue serving VMs independently — which is fine for stateless workloads but problematic if your database runs inside a VM that thinks two copies are active.

A Few Gotchas Worth Knowing

SSH key mismatch: If you join a second node and pvecm add fails with "permission denied," the issue is almost always an SSH key problem. Run ssh root@<first-node> from the joining node to confirm it connects without prompting for password, then retry.
Corosync restart after storage config: Changing /etc/pve/storage.cfg on one node triggers a corosync reload that briefly pauses VMs on all nodes. Schedule this during maintenance windows or at least expect 2–3 seconds of pause per affected VM.
Cluster network MTU: If your underlying switches support jumbo frames, set mtu 9000 on both vmbr0 and vmbr1. Mismatched MTUs cause silent packet drops that manifest as slow migrations and intermittent storage timeouts — harder to diagnose than any other issue in this setup.

Conclusion

Building a two-node Proxmox cluster from scratch takes about an hour of real work if your network is already wired correctly; the most time-consuming part is usually verifying connectivity rather than running commands. Once you have nodes joined, shared storage configured, and corosync reporting healthy quorum, you can start migrating VMs over — one at a time or all together depending on how much risk you're willing to take.

The next step after your cluster is up is automating backups so that failures don't cascade into data loss. Automated Backups with Proxmox Backup Server covers the backup side of this equation, and if you're planning to run more than two nodes long-term, Configure Parallel Sync Jobs for S3 Offsite Backups will save you significant time.