NVMe Passthrough on Proxmox for Bare-Metal Storage Speed
Pass an NVMe controller directly to a Proxmox VM and give TrueNAS or ZFS guests true bare-metal disk performance. Step-by-step IOMMU and vfio-pci setup guide.
On this page
NVMe passthrough gives a Proxmox VM direct, unmediated access to the physical NVMe controller — no QEMU emulation layer, no VirtIO driver in the middle. For TrueNAS SCALE, ZFS-on-Linux, or any workload that saturates a disk I/O path, this means the guest sees the same sequential and random performance numbers you'd measure on bare metal. You need an IOMMU-capable CPU and motherboard (Intel VT-d or AMD IOMMU), and by the end of this guide your VM will own the NVMe controller outright, with real SMART data and NVMe-native features visible inside the guest.
Key Takeaways
- IOMMU required: Intel VT-d or AMD-Vi must be enabled in BIOS/UEFI and confirmed active in the Proxmox host kernel before any PCIe passthrough works.
- Controller, not disk: You are passing the PCIe controller, not a raw block device path — the guest gets real hardware ownership, not a file-backed virtual disk.
- IOMMU grouping matters: If your NVMe shares an IOMMU group with other devices the host needs, you cannot pass it through without an ACS override patch.
- Backups change: PBS cannot back up passed-through disks — the guest must manage its own backup strategy for that storage.
- Live migration is gone: VMs with hostpci devices cannot be live-migrated; plan for offline maintenance windows.
Why NVMe Passthrough Beats VirtIO for I/O-Heavy Guests
VirtIO-SCSI is excellent for most workloads. A well-tuned VirtIO disk in Proxmox VE 9.1 achieves 90–95% of raw NVMe sequential throughput. But three scenarios make the remaining gap — and the latency floor VirtIO introduces — genuinely painful:
-
ZFS write intent log (ZIL) — ZFS flushes the ZIL synchronously on every fsync-requesting transaction. Every virtual driver roundtrip adds latency to an already latency-critical path. Hardware ownership drops that floor from roughly 0.3 ms to 0.06 ms on a Samsung 990 Pro.
-
NVMe-native features — NVMe namespaces, Power Loss Notification (PLN), and out-of-band management are invisible through a QEMU virtual disk. Passthrough exposes them directly to the guest OS.
-
Drive health monitoring —
smartctlinside the guest returns meaningful data only with hardware access. Through VirtIO, you get generic SMART emulation that often hides real drive health state.
If none of these apply — you're running a general-purpose Linux VM, a database on ext4, or a Docker host — VirtIO-SCSI is the right call and the simpler setup. Passthrough is worth it only when you're building a proper NAS or a storage-intensive compute node inside Proxmox.
Prerequisites and Hardware Compatibility
Before touching anything in the web UI, confirm these:
- BIOS/UEFI: Intel VT-d or AMD-Vi (AMD IOMMU) enabled. On most boards this lives under "Advanced > CPU Configuration" or "Chipset > VT-d".
- CPU: Any modern Intel (8th gen+) or AMD (Zen 2+) supports IOMMU. Older Xeon E3/E5 systems sometimes need a BIOS update to expose it.
- Motherboard: Consumer Z-series Intel boards have IOMMU, but group isolation is often poor — multiple M.2 slots share one group. Server boards (Supermicro X12, Dell PowerEdge) are far more predictable.
- Proxmox VE: 8.x or 9.x. This guide uses 9.1 commands throughout.
- NVMe drive: Not mounted by the Proxmox host for local-lvm, ZFS, or any storage pool.
How to Verify IOMMU Is Active on the Proxmox Host
SSH into your Proxmox node (or use the web UI Shell) and run:
dmesg | grep -e DMAR -e IOMMU | head -20
On Intel you want:
DMAR: IOMMU enabled
DMAR-IR: Enabled IRQ remapping in x2apic mode
On AMD:
AMD-Vi: AMD IOMMUv2 loaded and initialized
If you see nothing, IOMMU is either disabled in BIOS or the kernel parameter is missing. Edit the GRUB cmdline:
nano /etc/default/grub
Find GRUB_CMDLINE_LINUX_DEFAULT and add the right parameter:
# Intel
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
# AMD
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
The iommu=pt flag (passthrough mode) tells the kernel to use IOMMU only for devices that are actually passed through, reducing overhead for everything else on the host. Then regenerate GRUB and reboot:
update-grub
reboot
How to Find Your NVMe Controller and IOMMU Group
List all IOMMU groups on the system:
find /sys/kernel/iommu_groups -type l | sort -V | awk -F/ '{print $5, $NF}' | column -t
You will get output like:
1 0000:00:00.0
2 0000:00:01.0
...
14 0000:04:00.0
14 0000:04:00.1
Identify your NVMe:
lspci | grep -i nvme
Example output:
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/980PRO (rev 00)
So 0000:04:00.0 is the NVMe controller. Cross-reference with the IOMMU group output — in the example, group 14 contains 04:00.0 and 04:00.1. If other devices the host needs (a USB controller, a NIC) share that group, you cannot cleanly pass through the NVMe. Moving it to a different PCIe slot often solves the problem — a $15 PCIe NVMe adapter card in a standard x4 slot frequently gets its own isolated group on consumer boards.
Binding the NVMe to the vfio-pci Driver
By default the NVMe is managed by the nvme kernel driver. You need to unbind it and hand it to vfio-pci before Proxmox can attach it to a VM. The persistent way is via vfio-pci.ids.
First, get the vendor and device IDs:
lspci -n -s 04:00.0
Output:
04:00.0 0108: 144d:a80a (rev 00)
144d is Samsung, a80a is the 980 Pro NVMe controller. Add this to the VFIO config:
echo "options vfio-pci ids=144d:a80a" > /etc/modprobe.d/vfio.conf
Then ensure vfio-pci loads before the NVMe driver at boot:
echo "softdep nvme pre: vfio-pci" >> /etc/modprobe.d/vfio-pci.conf
Add the modules to initramfs:
echo -e "vfio\nvfio_iommu_type1\nvfio_pci" >> /etc/modules
update-initramfs -u -k all
reboot
After rebooting, verify the binding:
lspci -nnk -s 04:00.0
You want Kernel driver in use: vfio-pci. If it still shows nvme, confirm with lsblk that nothing on the host is still mounted from that drive.
How to Attach the NVMe Controller to a VM
In the Proxmox web UI:
- Select the target VM.
- Go to Hardware > Add > PCI Device.
- In the device dropdown, select your NVMe controller (shown as
0000:04:00.0). - Enable All Functions if the controller has multiple functions (
04:00.0and04:00.1). - Enable PCI-Express (ROM-BAR). Leave Primary GPU off.
- Click Add, then start the VM.
Or via CLI (preferred when scripting or managing multiple VMs):
qm set 100 -hostpci0 0000:04:00,pcie=1,allfunc=1
Replace 100 with your VM ID and 0000:04:00 with your actual PCIe address. The allfunc=1 flag handles multi-function controllers automatically. Then start:
qm start 100
Verifying the Passthrough Inside the Guest
SSH or console into the guest and confirm the hardware is present:
lspci | grep -i nvme
You should see the physical Samsung (or your vendor) controller listed — not a QEMU virtual NVMe device. Then confirm the kernel is driving it:
lspci -nnk | grep -A3 "Non-Volatile"
Expected:
Kernel driver in use: nvme
Kernel modules: nvme
List block devices to confirm the disk is accessible:
lsblk -d -o NAME,MODEL,SIZE,ROTA
Run a quick sequential write benchmark to confirm you're seeing hardware-level numbers:
fio --name=seqwrite --rw=write --bs=1M --size=4G \
--ioengine=libaio --iodepth=32 --direct=1 \
--filename=/dev/nvme0n1
On a Samsung 990 Pro (rated 7 GB/s), expect 5.5–6.8 GB/s sustained write. The same test through VirtIO on identical hardware peaks at roughly 4.2 GB/s — the gap is real and consistent across runs.
Inside TrueNAS SCALE 24.10+, navigate to Storage > Disks — the NVMe will show its full model name, serial number, and real SMART attributes. That is your confirmation the guest has genuine hardware ownership.
Gotchas from Real-World Use
PBS cannot back up passed-through disks. Proxmox Backup Server operates on VM disk images — .qcow2 files and LVM volumes managed by the hypervisor. A passed-through NVMe is invisible to PBS; the guest manages it directly. You need a backup strategy inside the guest: ZFS send/receive replication for TrueNAS, or restic/borgbackup for Linux VMs. If automated Proxmox-level backups via PBS are part of your DR plan, passthrough breaks that for those disks. Factor it into your backup design before committing.
Live migration is blocked. Proxmox will refuse to live-migrate a VM that owns a hostpci device. The CLI returns: VM 100 has a local resource: hostpci0. Plan for offline maintenance windows. If uptime guarantees matter, VirtIO-SCSI with shared storage is the only viable path.
Drive state after hard kill. If the VM panics or the hypervisor force-kills it, the NVMe may be in an inconsistent write state. TrueNAS ZFS will replay the ZIL on next boot and recover cleanly. ext4 and XFS may need fsck. This is identical behavior to a power-cut on bare metal — drives with PLN (Power Loss Notification) capacitors handle it more gracefully.
Host storage on the same drive. Do not create a local-lvm or ZFS pool on the NVMe at the host level and then try to pass through the same controller. Proxmox will let you configure it, but the guest and host will race for the hardware and data corruption follows. Always confirm the drive shows unmounted on the host with lsblk before attaching it to a VM.
When to Use Passthrough vs VirtIO-SCSI
| Scenario | Recommendation |
|---|---|
| TrueNAS SCALE with ZFS ZIL/L2ARC | Passthrough — latency floor matters |
| General Linux VM (web server, DB) | VirtIO-SCSI — simpler, PBS-friendly |
| Windows VM with large NTFS volume | VirtIO-SCSI — easier driver management |
| NAS-in-a-VM needing real SMART data | Passthrough |
| VM that requires live migration | VirtIO-SCSI only |
| AI/ML with large sequential datasets | Passthrough only if sequential write > 5 GB/s is the actual bottleneck |
| Bare-metal benchmark validation | Passthrough |
The decision comes down to performance versus operational flexibility. If you've moved your entire lab onto Proxmox and the reliability of your backup pipeline matters more than shaving ZIL latency, stick with VirtIO and use PBS normally.
Conclusion
NVMe passthrough on Proxmox VE 9.1 takes about 30 minutes — IOMMU kernel flags, VFIO driver binding, and a single qm set command — and delivers genuine bare-metal storage performance inside the guest. The tradeoffs are real: PBS backup coverage disappears for that disk, live migration is blocked, and you are committing that hardware to one VM. For TrueNAS SCALE or any ZFS workload that taxes the synchronous write path, it is worth every minute of setup time. The natural next step is configuring ZFS inside TrueNAS to use the passed-through NVMe as a dedicated SLOG device — that is where the latency gains compound most visibly in write-heavy workloads.