Proxmox Ceph Storage: Hyper-Converged Cluster Setup
Learn how to deploy Ceph hyper-converged storage directly on Proxmox VE nodes — pools, OSDs, CRUSH maps, and performance tuning covered step by step.
On this page
If you've ever wanted enterprise-grade distributed storage without a dedicated SAN or NAS appliance, Proxmox's built-in Ceph integration is worth your full attention. Ceph turns the local disks on your Proxmox nodes into a unified, self-healing storage pool — and the Proxmox web UI makes the whole setup far more approachable than standing up Ceph from scratch.
This guide walks you through deploying a hyper-converged Ceph cluster on Proxmox VE from first principles: monitors, managers, OSDs, pools, and CephFS. By the end, your VMs and LXC containers will live on replicated storage that survives a node failure without you touching anything.
What Is Hyper-Converged Storage?
In a traditional setup, compute and storage are separate concerns — your hypervisor hosts VMs, and a dedicated NAS or SAN hosts the disks. Hyper-convergence collapses those two layers so the same physical nodes handle both workloads.
Ceph makes this possible by running storage daemons alongside your VMs on the same hardware. Each Proxmox node contributes local disks to the Ceph cluster, and Ceph distributes data across those disks with configurable replication. The result is shared storage with no single point of failure and no dedicated appliance to manage.
Prerequisites
Before you start, make sure you have:
- At least three Proxmox nodes in a cluster (Ceph needs an odd number of monitors for quorum; three is the minimum for true redundancy)
- Dedicated disks for Ceph on each node — ideally SSDs, and separate from your OS disk
- A low-latency, dedicated network for Ceph traffic — a separate 10 GbE or 25 GbE link is strongly recommended
- Proxmox VE 8.x or 9.x (this guide uses 9.x syntax, but steps are nearly identical on 8.x)
- All nodes joined to the same Proxmox cluster (
pvecm statusshould show all nodes healthy)
A dedicated network for Ceph replication traffic is not optional in production — mixing Ceph replication with VM traffic on the same NIC causes severe latency spikes during heavy I/O.
Setting Up the Ceph Network
Ceph uses two logical networks:
- Public network — clients (VMs, CephFS mounts) communicate with Ceph monitors and OSDs on this network
- Cluster network — OSD-to-OSD replication traffic lives here, isolated from clients
Add both interfaces to each node. For a three-node cluster, your network plan might look like:
Node 1: 10.10.10.1/24 (public), 10.10.20.1/24 (cluster) Node 2: 10.10.10.2/24 (public), 10.10.20.2/24 (cluster) Node 3: 10.10.10.3/24 (public), 10.10.20.3/24 (cluster)
Edit /etc/network/interfaces on each node or use the Proxmox network configuration UI to add these interfaces. You don't need a bridge for Ceph — plain Ethernet interfaces work fine.
Installing Ceph on All Nodes
Proxmox ships with Ceph packages in its repositories. Install them on every node in the cluster.
Via the web UI: Navigate to a node → Ceph → click Install Ceph. Select your preferred Ceph version (Squid is current as of early 2026) and click through the wizard.
Via the CLI (run on each node):
pveceph install --version squid
This installs the Ceph packages and configures the apt sources. Do this on all three nodes before proceeding.
Initializing the Ceph Cluster
Run the initial configuration from one node only — the cluster configuration propagates automatically via Proxmox's cluster filesystem (pmxcfs).
pveceph init --network 10.10.10.0/24 --cluster-network 10.10.20.0/24
This creates /etc/ceph/ceph.conf with the correct network settings and generates a cluster FSID. You can verify the config was written:
cat /etc/ceph/ceph.conf
Expected output:
[global]
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
cluster_network = 10.10.20.0/24
fsid = <generated-uuid>
mon_host = 10.10.10.1
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_size = 3
public_network = 10.10.10.0/24
Creating Monitors and Managers
Ceph monitors maintain the cluster map — they track OSD state, placement group health, and quorum. You need one monitor per node for redundancy.
Create a monitor on each node using the web UI (Ceph → Monitor → Create) or via CLI:
# Run on each respective node
pveceph mon create
Verify monitors are up and in quorum:
ceph mon stat
You should see all three monitors listed and quorum containing all of them.
Next, create a Ceph Manager (MGR) on at least two nodes. The manager handles the dashboard, metrics, and orchestration modules:
pveceph mgr create
Run this on two nodes. One becomes the active manager; the other stays on standby for failover.
Adding OSDs (Object Storage Daemons)
OSDs are the workhorses of Ceph — each OSD manages one physical disk. For a three-node cluster with two SSDs per node, you'll create six OSDs total.
Check available disks on each node:
lsblk -d -o NAME,SIZE,MODEL,ROTA
Disks intended for Ceph should have no existing partitions. If they do, wipe them first:
wipefs -a /dev/sdX
Create OSDs via the web UI: Navigate to Node → Ceph → OSD → Create OSD. Select the disk, choose your DB/WAL device if you have a separate NVMe for metadata, and click Create.
Create OSDs via CLI:
pveceph osd create /dev/sdb
If you have a fast NVMe for the BlueStore DB and WAL (write-ahead log), specify it:
pveceph osd create /dev/sdb --db-dev /dev/nvme0n1 --db-size 30GiB --wal-dev /dev/nvme0n1 --wal-size 10GiB
Placing the BlueStore DB on NVMe dramatically improves random I/O performance — especially for spinning disks.
Repeat OSD creation on all nodes. Check OSD status:
ceph osd tree
A healthy cluster shows all OSDs as up and in:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 3.63919 root default -3 1.21306 host pve1 0 ssd 0.60653 osd.0 up 1.00000 1.00000 1 ssd 0.60653 osd.1 up 1.00000 1.00000 -5 1.21306 host pve2 2 ssd 0.60653 osd.2 up 1.00000 1.00000 3 ssd 0.60653 osd.3 up 1.00000 1.00000 -7 1.21306 host pve3 4 ssd 0.60653 osd.4 up 1.00000 1.00000 5 ssd 0.60653 osd.5 up 1.00000 1.00000
Creating Storage Pools
Ceph organizes data into pools. Proxmox needs at least one pool for VM disk images (RBD — RADOS Block Device).
Create a pool via the web UI: Ceph → Pools → Create. Set the pool name, replica size (3 is standard), and PG autoscaling.
Create a pool via CLI:
pveceph pool create vm-store --pg_autoscale_mode on --size 3 --min_size 2
The --min_size 2 setting means writes are accepted as long as two of the three replicas acknowledge — allowing the cluster to remain writable with one OSD down.
Add the pool as Proxmox storage: The web UI does this automatically when you create a pool via the Proxmox interface. Via CLI:
pvesm add rbd vm-store --pool vm-store --content images,rootdir --krbd 0
Verify it shows up:
pvesm status
Setting Up CephFS for Shared Storage
CephFS provides a POSIX-compliant distributed filesystem on top of your Ceph cluster. It's useful for ISO storage, backups, and container templates that need to be accessible from all nodes without an NFS server.
Create the CephFS metadata server (MDS) on two nodes:
pveceph mds create
Then create the filesystem:
pveceph fs create --name cephfs --pg_num 64
This creates two underlying Ceph pools — cephfs_data and cephfs_metadata. Add CephFS as Proxmox storage:
pvesm add cephfs cephfs-storage \
--fs-name cephfs \
--content vztmpl,iso,backup \
--path /mnt/pve/cephfs-storage
Now ISO images and LXC templates uploaded from any node are immediately visible cluster-wide.
Tuning Ceph Performance
Default Ceph settings are conservative. A few targeted changes make a measurable difference in homelab and small production environments.
Placement Group Count
With autoscaling enabled, Ceph handles PG count automatically. Monitor for warnings:
ceph health detail
If you see HEALTH_WARN too few PGs, increase manually or let the autoscaler catch up:
ceph osd pool set vm-store pg_num 128
ceph osd pool set vm-store pgp_num 128
OSD Memory Target
BlueStore allocates cache dynamically. For nodes with 64 GB+ RAM, increase the OSD memory target:
ceph config set osd osd_memory_target 8G
This gives each OSD daemon 8 GB of cache, significantly improving read performance for hot data.
RBD Cache for VMs
Enable RBD caching in the QEMU configuration for better VM I/O:
ceph config set client rbd_cache true
ceph config set client rbd_cache_size 67108864
ceph config set client rbd_cache_max_dirty 50331648
Apply changes by restarting OSDs (rolling restart, one at a time):
systemctl restart ceph-osd@0.service
Checking Cluster Health
Ceph provides rich diagnostics. These are the commands you'll use daily:
# Overall health summary
ceph status
Detailed health warnings
ceph health detail
Watch I/O in real time
ceph -w
OSD utilization
ceph osd df tree
Pool statistics
ceph df detail
Placement group status
ceph pg stat
A healthy cluster reports HEALTH_OK with all PGs in active+clean state. During rebalancing after adding an OSD, you'll see active+remapped or active+backfilling — this is normal and resolves automatically.
Recovering from OSD Failures
When an OSD fails (disk error, node reboot), Ceph marks it down and begins remapping affected placement groups to surviving OSDs. With size=3, min_size=2, the cluster stays fully operational with one OSD down.
If the OSD comes back after a short outage, Ceph automatically pegs it back in and syncs any missed writes. For a permanently failed disk:
# Mark the OSD out so Ceph rebalances data off it
ceph osd out osd.3
Wait for rebalancing to complete (watch ceph -w)
Then remove the OSD
pveceph osd destroy 3 --destroy-disks
After rebalancing, replace the disk, and create a new OSD on the replacement:
pveceph osd create /dev/sdb
Ceph handles data redistribution automatically — no manual intervention needed beyond these commands.
Migrating Existing VMs to Ceph Storage
Once your Ceph pool is working, you can live-migrate existing VMs from local storage to Ceph without downtime:
- In the Proxmox web UI, right-click the VM → Migrate
- Select the target node and set Target Storage to your Ceph pool
- Click Migrate — the VM disk streams to Ceph while the VM keeps running
Via CLI:
qm migrate 101 pve2 --online --targetstorage vm-store
After migration, the VM can be live-migrated between any cluster node because its disk lives in Ceph — shared storage accessible from all nodes simultaneously.
Ceph Dashboard
Proxmox includes the Ceph Dashboard via the MGR dashboard module. Access it at:
https://
The dashboard shows:
- Cluster health and capacity
- OSD map and status
- I/O graphs (read/write IOPS, throughput, latency)
- Pool usage and PG distribution
- RBD image list
Enable it if not already active:
ceph mgr module enable dashboard
ceph dashboard create-self-signed-cert
Set an admin password:
ceph dashboard ac-user-set-password admin <password>
Common Gotchas
- Don't use the same disk for OS and Ceph. BlueStore writes heavily and will wear out a shared OS disk faster while also impacting system stability.
- Ceph hates clock skew. All nodes must have NTP synchronized. Run
chronyc trackingon each node and verify offset is under 0.05 seconds. - Three nodes is the minimum for production. A two-node Ceph cluster can't maintain quorum when one node goes down — you'll lose access to your data entirely.
- Network bandwidth matters more than you think. A three-replica cluster writes each byte three times. 1 GbE will saturate quickly under real load. Budget for 10 GbE on the cluster network.
- Don't mix HDD and SSD OSDs in the same pool without CRUSH rule adjustments — Ceph will mix them and your performance will be bottlenecked by the slowest disks.
Conclusion
Ceph on Proxmox is one of the most powerful things you can add to a homelab or small private cloud setup. What used to require dedicated hardware and a deep Ceph expertise is now approachable through the Proxmox UI — though understanding what's happening under the hood (monitors, OSDs, CRUSH maps, PGs) lets you tune it properly and recover confidently when things go wrong.
The key takeaways: use three or more nodes, dedicate separate networks for public and cluster traffic, put your BlueStore DB on NVMe if your budget allows, and let PG autoscaling handle the tuning math. Once your cluster is HEALTH_OK, you get live migration, self-healing storage, and no-SAN redundancy — all running on hardware you already own.