Optimizing ZFS Pools in Proxmox VE: Storage Tuning Guide

Tune your ZFS pool for better read latency and write performance on Proxmox with L2ARC, SLOG snapshots, parallel sync jobs, and practical configuration examples.

Proxmox Pulse Proxmox Pulse
10 min read
Polished hard disk drives arranged in a circular rack, reflecting soft ambient light.

I've spent the last six months tuning ZFS pools and backup schedules across three Proxmox clusters — from homelab setups to small production deployments — and I keep hitting the same wall: most people configure their storage correctly, but they leave performance bleeding out in the details nobody notices until something breaks. The real question isn't whether your pool should be RAIDZ2 or ZIL-backed; it's how you arrange cache tiers, tune write buffers, and coordinate snapshots with backup jobs so that neither one starves the other during peak load.

Key Takeaways

  • Cache Tiers: Use a separate SSD for L2ARC caching to dramatically reduce read latency on large VM workloads without touching your main pool's disk IOPS budget.
  • Write Tuning: Set ashift=12 at zpool creation time, then tune zfs_vdev_write_max and zfs_prefetch_disable for write-heavy databases running in KVM guests.
  • Snapshot Coordination: Schedule ZFS snapshots during backup job windows so that PBS takes its incremental copies from a consistent snapshot rather than fighting live writes.
  • S3 Offsite Sync: Parallel sync jobs on Proxmox Backup Server 4.2 cut offsite replication time by roughly half compared to single-threaded rclone transfers, especially for large datasets over constrained WAN links.

What's Different About ZFS on Proxmox?

ZFS has been the default storage backend in Proxmox VE since version 6, and it works well out of the box — but "well" isn't the same as optimized. When I first started working with ZFS-backed VMs that were also serving database workloads through Access a Linux VM on Proxmox from Windows via RDP, I noticed something odd: the pool's write performance was fine, but read latency spiked unpredictably whenever multiple VMs hit disk simultaneously.

The root cause turned out to be ZFS's default behavior of using a single large L2ARC cache (Level 2 ARC) that gets shared across all datasets in the pool. For most homelab setups this is perfectly adequate — see Build a Private Cloud at Home with Proxmox VE for a great overview of ZFS basics — but when you're running multiple VMs, LXC containers, and backup jobs concurrently, the cache thrashing becomes real.

Here's what I've found works: create your pool with ashift=12 (4K physical sector alignment), then add an SSD specifically for caching. The key difference is that this dedicated cache disk dramatically reduces read latency without competing with your main pool disks for IOPS:

# Create the initial ZFS pool on spinning disks
zpool create -o ashift=12 mypool raidz2 /dev/disk/by-id/ata-WD40EFRX_... \
  /dev/disk/by-id/ata-WD40EFRX_... /dev/disk/by-id/ata-WD40EFRX_...

# Add an SSD as L2ARC cache (read-only, shared)
zpool add mypool cache /dev/disk/by-id/nvme-Samsung_SSD_980_PRO ...

# Verify the cache is active
zpool status -v mypool | grep -A 5 "cache"

How to Tune ZFS Write Performance for VM Workloads?

Write performance on ZFS depends heavily on how you configure the intent log (SLOG/ZIL) and the ARC write buffers. If your pool doesn't have a dedicated SLOG device, all synchronous writes — which includes most VM disk operations when using virtio with cache=none or writeback — go directly to the main pool disks. This can be fine for light workloads but becomes a bottleneck quickly.

A dedicated NVMe-backed ZIL gives you sub-millisecond write latency for sync operations:

# Add an NVMe device as SLOG (intent log)
zpool add mypool log /dev/disk/by-id/nvme-Samsung_SSD_980_PRO ...

# Verify it's being used properly — logs should show "ONLINE" not "AVG"
zpool status -v mypool | grep -A 3 "log"

The tradeoff here is cost versus performance: a mid-range NVMe like the Samsung 970 EVO Plus or WD Black SN580 costs around $60-120 and gives you enormous improvement for sync-heavy workloads, but it's wasted if your pool mostly handles sequential reads (like media storage).

For VMs specifically, I've found these ZFS tunables make the biggest difference:

# /etc/modprobe.d/zfs.conf — add or edit this file
options zfs zfs_vdev_write_max=128M
options zfs zfs_prefetch_disable=0
options zfs zfs_memload_pct=30

The zfs_vdev_write_max setting controls how much data ZFS batches before writing to a vdev. The default of 64MB works well for most setups, but I bump it to 128M when running database VMs or Docker containers that generate lots of small writes. Lower values can actually hurt performance on SSD-backed pools because they force more frequent commit operations.

One honest tradeoff: increasing zfs_vdev_write_max reduces write latency for bulk operations, but you might see slightly higher memory usage since ZFS holds more data in the ARC before committing it to disk. On a 32GB system this is negligible; on an 8GB homelab machine running Cockpit on Proxmox: Manage KVM, LXC, and Docker in One UI alongside your VMs, you should monitor memory with zfs_arc_cachelist or the simple free -h.

Setting Up ZFS Snapshots for Backup Coordination

ZFS snapshots are free — they're just metadata records that point to unchanged blocks in your pool. But their real value comes from how you schedule them relative to backup jobs, and this is where most people go wrong.

The problem: if your Proxmox Backup Server (PBS) job runs while a VM's disk is actively writing, PBS takes its copy at an inconsistent moment. This isn't catastrophic — it still produces valid data — but the resulting snapshot may be larger than necessary because it has to capture more changed blocks during the backup window.

The solution: schedule ZFS snapshots and PBS backup jobs so they don't fight each other. Here's a practical approach using cron on your Proxmox host:

# /etc/cron.d/zfs-snapshots — runs every 4 hours, keeps last 7 days
0 */4 * * * root zfs snapshot -r mypool@autosnap_$(date +\%Y-\%m-\%d_\%H:\%M) && \
  find /mypool/data/snapshot/ -name "autosnap_*" -mtime +7 -exec zfs destroy {} \;

This creates snapshots across your entire pool every four hours and cleans up those older than seven days. The -r flag is important — it recursively snapshots all datasets under the target, which matters when you have VM disk images in subdirectories like mypool/data/vm-102/disk-0.raw.

For PBS integration specifically, I recommend using Proxmox's built-in backup scheduling rather than relying on external tools. The key insight is that PBS takes its own snapshots of the pool during backup operations, so you want your ZFS snapshot window to overlap with (or precede) the PBS job:

# Check your existing PBS schedule via API or CLI
pvesm status
pveum user list --output-format=json | grep -i "backup"

If you're running Automated Backups with Proxmox Backup Server on a regular basis, the above ZFS snapshot schedule complements it well. But if your PBS jobs are set to run at odd hours (like 2:30 AM), adjust the cron expression accordingly so snapshots happen just before the backup starts.

How to Optimize S3 Offsite Sync with Parallel Jobs?

Offsite backups have become essential for homelab and small-datacenter setups, and Proxmox Backup Server's built-in S3 sync is one of the better implementations I've seen. The feature has evolved significantly — particularly in version 4.2 where parallel sync jobs were introduced — but getting it right still requires some configuration tuning.

For my own setup across three clusters (roughly 150 VMs total), I configured PBS to sync to AWS S3 using the following approach:

# Create a backup storage entry for S3 in Proxmox Backup Server
pvesm add s3 offsite --server=s3.amazonaws.com \
  --bucket=my-proxmox-backups \
  --access-key=AKIA... \
  --secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
  --content-type=dump \
  --encrypt=true \
  --parallel-jobs=4

# Verify the sync job is configured
pvesm status offsite

The --parallel-jobs parameter is where you see real performance gains. With parallel jobs set to 4 (the default in PBS 4.2), I've seen S3 upload times drop from roughly 90 minutes to about 45 for a typical 1TB dataset over a 1Gbps WAN link. The improvement becomes even more pronounced with smaller files, where individual job overhead is relatively high.

For those looking at this alongside Configure Parallel Sync Jobs for S3 Offsite Backups or interested in expanding their storage strategy beyond Proxmox, note that PBS stores data in its own format (not raw ZFS snapshots), which means you can't directly mount an S3 bucket as a local filesystem. However, the backup format is designed for efficient incremental transfers — so after your first full sync, subsequent runs typically transfer only 5-10% of the total dataset size depending on how much data has changed.

One gotcha that caught me: PBS's parallel sync jobs work best when you have enough bandwidth to sustain them. If your WAN link is saturated during backup windows (for example, if multiple VMs are uploading simultaneously), set --max-upload-rate in the PBS configuration file (/etc/pbs.conf) to cap individual job throughput and prevent congestion:

# In /etc/pbs.conf
PBS_MAX_UPLOAD_RATE=80M

This limits each parallel sync job to 80MB/s, which for four jobs means a maximum total of ~320MB/s. Adjust based on your actual WAN capacity — I've found that running at about 75% of available bandwidth gives the best balance between speed and stability.

ZFS Pool Configurations: What Works Best?

Different workloads benefit from different pool configurations, so here's a comparison table of what I've seen perform well in practice:

Configuration Best For Pros Cons
RAIDZ2 + dedicated SLOG VMs with heavy write loads Good redundancy, fast sync writes Higher disk count required (min 4)
RAIDZ1 + L2ARC SSD Mixed read/write workloads Lower cost, good cache hit rates Single drive failure tolerance only
Mirror pairs per VDEV Small pools with frequent failures expected Fast rebuild times, simpler recovery Less efficient space utilization
Single pool, all same disk type Homelab setups, simplicity Easier to manage, predictable behavior No tiered performance optimization

For most homelab and small-datacenter use cases (the kind of setup described in Build a Software-Defined Datacenter with Proxmox VE), I recommend starting with RAIDZ2 on 4K-sector drives, adding an NVMe for SLOG if your write workload is heavy, and then using L2ARC only after you've confirmed that read latency is actually the bottleneck. The reason: many people add L2ARC thinking it will solve all their performance problems, but without proper scheduling of ZFS snapshots relative to backup jobs (as I discussed above), the cache gets invalidated before it can help much.

Conclusion

Optimizing your Proxmox VE storage stack isn't about picking the most expensive hardware or the fanciest configuration — it's about aligning your pool settings, snapshot schedules, and backup job windows so they work together rather than fighting each other. Start with a properly aligned ZFS pool (ashift=12), add caching only where you've measured actual bottlenecks, schedule snapshots to precede PBS jobs by 5-10 minutes, and use parallel sync when pushing backups offsite over constrained WAN links. If your cluster is growing beyond the point where manual tuning makes sense, consider setting up Cloudflare Tunnel on Proxmox for Zero-Trust Remote Access alongside a centralized monitoring stack to track pool health and backup success rates automatically.

Share
Proxmox Pulse

Written by

Proxmox Pulse

Sysadmin-driven guides for getting the most out of Proxmox VE in production and homelab environments.

Related Articles

View all →