Proxmox Ceph Storage: Hyper-Converged Cluster Setup

Learn how to deploy Ceph hyper-converged storage directly on Proxmox VE nodes — pools, OSDs, CRUSH maps, and performance tuning covered step by step.

Proxmox Pulse Proxmox Pulse
11 min read
Ceph Proxmox distributed storage hyper-converged storage cluster
Three interconnected servers with glowing data streams forming a distributed hyper-converged storage cluster.

If you've ever wanted enterprise-grade distributed storage without a dedicated SAN or NAS appliance, Proxmox's built-in Ceph integration is worth your full attention. Ceph turns the local disks on your Proxmox nodes into a unified, self-healing storage pool — and the Proxmox web UI makes the whole setup far more approachable than standing up Ceph from scratch.

This guide walks you through deploying a hyper-converged Ceph cluster on Proxmox VE from first principles: monitors, managers, OSDs, pools, and CephFS. By the end, your VMs and LXC containers will live on replicated storage that survives a node failure without you touching anything.

What Is Hyper-Converged Storage?

In a traditional setup, compute and storage are separate concerns — your hypervisor hosts VMs, and a dedicated NAS or SAN hosts the disks. Hyper-convergence collapses those two layers so the same physical nodes handle both workloads.

Ceph makes this possible by running storage daemons alongside your VMs on the same hardware. Each Proxmox node contributes local disks to the Ceph cluster, and Ceph distributes data across those disks with configurable replication. The result is shared storage with no single point of failure and no dedicated appliance to manage.

Prerequisites

Before you start, make sure you have:

  • At least three Proxmox nodes in a cluster (Ceph needs an odd number of monitors for quorum; three is the minimum for true redundancy)
  • Dedicated disks for Ceph on each node — ideally SSDs, and separate from your OS disk
  • A low-latency, dedicated network for Ceph traffic — a separate 10 GbE or 25 GbE link is strongly recommended
  • Proxmox VE 8.x or 9.x (this guide uses 9.x syntax, but steps are nearly identical on 8.x)
  • All nodes joined to the same Proxmox cluster (pvecm status should show all nodes healthy)

A dedicated network for Ceph replication traffic is not optional in production — mixing Ceph replication with VM traffic on the same NIC causes severe latency spikes during heavy I/O.

Setting Up the Ceph Network

Ceph uses two logical networks:

  • Public network — clients (VMs, CephFS mounts) communicate with Ceph monitors and OSDs on this network
  • Cluster network — OSD-to-OSD replication traffic lives here, isolated from clients

Add both interfaces to each node. For a three-node cluster, your network plan might look like:

Node 1: 10.10.10.1/24 (public), 10.10.20.1/24 (cluster) Node 2: 10.10.10.2/24 (public), 10.10.20.2/24 (cluster) Node 3: 10.10.10.3/24 (public), 10.10.20.3/24 (cluster)

Edit /etc/network/interfaces on each node or use the Proxmox network configuration UI to add these interfaces. You don't need a bridge for Ceph — plain Ethernet interfaces work fine.

Installing Ceph on All Nodes

Proxmox ships with Ceph packages in its repositories. Install them on every node in the cluster.

Via the web UI: Navigate to a node → Ceph → click Install Ceph. Select your preferred Ceph version (Squid is current as of early 2026) and click through the wizard.

Via the CLI (run on each node):

pveceph install --version squid

This installs the Ceph packages and configures the apt sources. Do this on all three nodes before proceeding.

Initializing the Ceph Cluster

Run the initial configuration from one node only — the cluster configuration propagates automatically via Proxmox's cluster filesystem (pmxcfs).

pveceph init --network 10.10.10.0/24 --cluster-network 10.10.20.0/24

This creates /etc/ceph/ceph.conf with the correct network settings and generates a cluster FSID. You can verify the config was written:

cat /etc/ceph/ceph.conf

Expected output:

[global]
        auth_cluster_required = cephx
        auth_service_required = cephx
        auth_client_required = cephx
        cluster_network = 10.10.20.0/24
        fsid = <generated-uuid>
        mon_host = 10.10.10.1
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_size = 3
        public_network = 10.10.10.0/24

Creating Monitors and Managers

Ceph monitors maintain the cluster map — they track OSD state, placement group health, and quorum. You need one monitor per node for redundancy.

Create a monitor on each node using the web UI (Ceph → Monitor → Create) or via CLI:

# Run on each respective node
pveceph mon create

Verify monitors are up and in quorum:

ceph mon stat

You should see all three monitors listed and quorum containing all of them.

Next, create a Ceph Manager (MGR) on at least two nodes. The manager handles the dashboard, metrics, and orchestration modules:

pveceph mgr create

Run this on two nodes. One becomes the active manager; the other stays on standby for failover.

Adding OSDs (Object Storage Daemons)

OSDs are the workhorses of Ceph — each OSD manages one physical disk. For a three-node cluster with two SSDs per node, you'll create six OSDs total.

Check available disks on each node:

lsblk -d -o NAME,SIZE,MODEL,ROTA

Disks intended for Ceph should have no existing partitions. If they do, wipe them first:

wipefs -a /dev/sdX

Create OSDs via the web UI: Navigate to Node → Ceph → OSD → Create OSD. Select the disk, choose your DB/WAL device if you have a separate NVMe for metadata, and click Create.

Create OSDs via CLI:

pveceph osd create /dev/sdb

If you have a fast NVMe for the BlueStore DB and WAL (write-ahead log), specify it:

pveceph osd create /dev/sdb --db-dev /dev/nvme0n1 --db-size 30GiB --wal-dev /dev/nvme0n1 --wal-size 10GiB

Placing the BlueStore DB on NVMe dramatically improves random I/O performance — especially for spinning disks.

Repeat OSD creation on all nodes. Check OSD status:

ceph osd tree

A healthy cluster shows all OSDs as up and in:

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 3.63919 root default -3 1.21306 host pve1 0 ssd 0.60653 osd.0 up 1.00000 1.00000 1 ssd 0.60653 osd.1 up 1.00000 1.00000 -5 1.21306 host pve2 2 ssd 0.60653 osd.2 up 1.00000 1.00000 3 ssd 0.60653 osd.3 up 1.00000 1.00000 -7 1.21306 host pve3 4 ssd 0.60653 osd.4 up 1.00000 1.00000 5 ssd 0.60653 osd.5 up 1.00000 1.00000

Creating Storage Pools

Ceph organizes data into pools. Proxmox needs at least one pool for VM disk images (RBD — RADOS Block Device).

Create a pool via the web UI: Ceph → Pools → Create. Set the pool name, replica size (3 is standard), and PG autoscaling.

Create a pool via CLI:

pveceph pool create vm-store --pg_autoscale_mode on --size 3 --min_size 2

The --min_size 2 setting means writes are accepted as long as two of the three replicas acknowledge — allowing the cluster to remain writable with one OSD down.

Add the pool as Proxmox storage: The web UI does this automatically when you create a pool via the Proxmox interface. Via CLI:

pvesm add rbd vm-store --pool vm-store --content images,rootdir --krbd 0

Verify it shows up:

pvesm status

Setting Up CephFS for Shared Storage

CephFS provides a POSIX-compliant distributed filesystem on top of your Ceph cluster. It's useful for ISO storage, backups, and container templates that need to be accessible from all nodes without an NFS server.

Create the CephFS metadata server (MDS) on two nodes:

pveceph mds create

Then create the filesystem:

pveceph fs create --name cephfs --pg_num 64

This creates two underlying Ceph pools — cephfs_data and cephfs_metadata. Add CephFS as Proxmox storage:

pvesm add cephfs cephfs-storage \
  --fs-name cephfs \
  --content vztmpl,iso,backup \
  --path /mnt/pve/cephfs-storage

Now ISO images and LXC templates uploaded from any node are immediately visible cluster-wide.

Tuning Ceph Performance

Default Ceph settings are conservative. A few targeted changes make a measurable difference in homelab and small production environments.

Placement Group Count

With autoscaling enabled, Ceph handles PG count automatically. Monitor for warnings:

ceph health detail

If you see HEALTH_WARN too few PGs, increase manually or let the autoscaler catch up:

ceph osd pool set vm-store pg_num 128
ceph osd pool set vm-store pgp_num 128

OSD Memory Target

BlueStore allocates cache dynamically. For nodes with 64 GB+ RAM, increase the OSD memory target:

ceph config set osd osd_memory_target 8G

This gives each OSD daemon 8 GB of cache, significantly improving read performance for hot data.

RBD Cache for VMs

Enable RBD caching in the QEMU configuration for better VM I/O:

ceph config set client rbd_cache true
ceph config set client rbd_cache_size 67108864
ceph config set client rbd_cache_max_dirty 50331648

Apply changes by restarting OSDs (rolling restart, one at a time):

systemctl restart ceph-osd@0.service

Checking Cluster Health

Ceph provides rich diagnostics. These are the commands you'll use daily:

# Overall health summary
ceph status

Detailed health warnings

ceph health detail

Watch I/O in real time

ceph -w

OSD utilization

ceph osd df tree

Pool statistics

ceph df detail

Placement group status

ceph pg stat

A healthy cluster reports HEALTH_OK with all PGs in active+clean state. During rebalancing after adding an OSD, you'll see active+remapped or active+backfilling — this is normal and resolves automatically.

Recovering from OSD Failures

When an OSD fails (disk error, node reboot), Ceph marks it down and begins remapping affected placement groups to surviving OSDs. With size=3, min_size=2, the cluster stays fully operational with one OSD down.

If the OSD comes back after a short outage, Ceph automatically pegs it back in and syncs any missed writes. For a permanently failed disk:

# Mark the OSD out so Ceph rebalances data off it
ceph osd out osd.3

Wait for rebalancing to complete (watch ceph -w)

Then remove the OSD

pveceph osd destroy 3 --destroy-disks

After rebalancing, replace the disk, and create a new OSD on the replacement:

pveceph osd create /dev/sdb

Ceph handles data redistribution automatically — no manual intervention needed beyond these commands.

Migrating Existing VMs to Ceph Storage

Once your Ceph pool is working, you can live-migrate existing VMs from local storage to Ceph without downtime:

  1. In the Proxmox web UI, right-click the VM → Migrate
  2. Select the target node and set Target Storage to your Ceph pool
  3. Click Migrate — the VM disk streams to Ceph while the VM keeps running

Via CLI:

qm migrate 101 pve2 --online --targetstorage vm-store

After migration, the VM can be live-migrated between any cluster node because its disk lives in Ceph — shared storage accessible from all nodes simultaneously.

Ceph Dashboard

Proxmox includes the Ceph Dashboard via the MGR dashboard module. Access it at:

https://:8443

The dashboard shows:

  • Cluster health and capacity
  • OSD map and status
  • I/O graphs (read/write IOPS, throughput, latency)
  • Pool usage and PG distribution
  • RBD image list

Enable it if not already active:

ceph mgr module enable dashboard
ceph dashboard create-self-signed-cert

Set an admin password:

ceph dashboard ac-user-set-password admin <password>

Common Gotchas

  • Don't use the same disk for OS and Ceph. BlueStore writes heavily and will wear out a shared OS disk faster while also impacting system stability.
  • Ceph hates clock skew. All nodes must have NTP synchronized. Run chronyc tracking on each node and verify offset is under 0.05 seconds.
  • Three nodes is the minimum for production. A two-node Ceph cluster can't maintain quorum when one node goes down — you'll lose access to your data entirely.
  • Network bandwidth matters more than you think. A three-replica cluster writes each byte three times. 1 GbE will saturate quickly under real load. Budget for 10 GbE on the cluster network.
  • Don't mix HDD and SSD OSDs in the same pool without CRUSH rule adjustments — Ceph will mix them and your performance will be bottlenecked by the slowest disks.

Conclusion

Ceph on Proxmox is one of the most powerful things you can add to a homelab or small private cloud setup. What used to require dedicated hardware and a deep Ceph expertise is now approachable through the Proxmox UI — though understanding what's happening under the hood (monitors, OSDs, CRUSH maps, PGs) lets you tune it properly and recover confidently when things go wrong.

The key takeaways: use three or more nodes, dedicate separate networks for public and cluster traffic, put your BlueStore DB on NVMe if your budget allows, and let PG autoscaling handle the tuning math. Once your cluster is HEALTH_OK, you get live migration, self-healing storage, and no-SAN redundancy — all running on hardware you already own.

Share
Proxmox Pulse

Written by

Proxmox Pulse

Sysadmin-driven guides for getting the most out of Proxmox VE in production and homelab environments.

Related Articles

View all →