Proxmox AI Workloads: GPU VM vs LXC Container Guide

Compare GPU passthrough for AI inference on Proxmox: KVM VM vs LXC container. Performance benchmarks, setup complexity, and which option suits your hardware.

Proxmox Pulse Proxmox Pulse
10 min read
GPU card shown split between a virtual machine and a lightweight LXC container.

Running local AI models on Proxmox has gone from a niche experiment to a legitimate homelab workload. Whether you're self-hosting Ollama, llama.cpp, or a full Open WebUI stack, the question comes up fast: should you spin up a KVM virtual machine with full GPU passthrough, or use a lightweight LXC container with GPU device access? Both work. But they're meaningfully different in complexity, overhead, and flexibility — and the right choice depends on your hardware and goals.

Understanding the Two Approaches

Before diving into setup steps, it helps to understand what's actually happening under the hood with each option.

Full GPU passthrough (KVM VM) uses VFIO/IOMMU to hand the physical GPU directly to a virtual machine. The VM gets exclusive ownership of the GPU, talks to it through native drivers, and the host sees the device as unavailable. This is the same mechanism used for gaming VMs and gives near-native GPU performance.

LXC GPU device passthrough is different. LXC containers share the host kernel — there's no hardware virtualization. You expose GPU device nodes (/dev/dri, /dev/nvidia*) directly into the container. The container uses the host's kernel drivers. It's lighter weight but requires matching drivers between host and container.

Both approaches can run Ollama and deliver fast inference. The differences show up in setup complexity, isolation, driver management, and how much overhead you're willing to accept.

Hardware Prerequisites

Regardless of which path you take, you'll need:

  • A GPU supported by your inference stack (NVIDIA for CUDA, AMD for ROCm, or Intel Arc for IPEX)
  • IOMMU enabled in BIOS (for VM passthrough) — look for AMD-Vi or Intel VT-d
  • A Proxmox host running VE 8+ (VE 9 recommended for improved VFIO handling)

For NVIDIA cards, verify your IOMMU groups aren't contaminated by other devices. A GPU sharing an IOMMU group with your NVMe controller is a common homelab headache.

# Check IOMMU groups on your Proxmox host
for d in /sys/kernel/iommu_groups/*/devices/*; do
  n=${d#*/iommu_groups/*}; n=${n%%/*}
  printf 'IOMMU Group %s ' "$n"
  lspci -nns "${d##*/}"
done

You want your GPU (and ideally its audio device) in an isolated group with no other critical devices.

Setting Up GPU Passthrough in a KVM VM

This is the more involved path, but it gives you a fully isolated AI inference server.

Step 1: Enable IOMMU on the Host

Edit /etc/default/grub to add the IOMMU kernel parameter:

# For Intel CPUs
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

For AMD CPUs

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

Then update grub and add the necessary kernel modules:

update-grub
echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" >> /etc/modules
update-initramfs -u -k all
reboot

Step 2: Bind the GPU to VFIO

Get your GPU's PCI IDs:

lspci -nn | grep -i nvidia
# Example output: 01:00.0 VGA [10de:2204] NVIDIA RTX 3090
# Example output: 01:00.1 Audio [10de:1aef]

Create a VFIO config file to bind those IDs:

echo "options vfio-pci ids=10de:2204,10de:1aef" > /etc/modprobe.d/vfio.conf
update-initramfs -u -k all
reboot

Verify the GPU is now using the vfio-pci driver:

lspci -k | grep -A3 "01:00.0"
# Should show: Kernel driver in use: vfio-pci

Step 3: Create the VM

In the Proxmox UI, create a VM with these settings for AI inference:

  • OS: Ubuntu 24.04 or Debian 12
  • Machine type: q35
  • BIOS: OVMF (UEFI)
  • CPU: host type (important for performance)
  • RAM: allocate generously — models load fully into VRAM but CPU context matters

After creation, add the PCI device via Hardware → Add → PCI Device. Select your GPU and enable All Functions, ROM-Bar, and PCI-Express.

Step 4: Install Drivers and Ollama in the VM

# Inside the VM — install NVIDIA drivers
apt update && apt install -y nvidia-driver-550 nvidia-utils-550

Verify GPU is visible

nvidia-smi

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Pull a model and test

ollama pull llama3.2 ollama run llama3.2 "Explain ZFS in one paragraph"

Ollama automatically detects CUDA and uses the GPU for inference.

Setting Up GPU Access in an LXC Container

The LXC approach requires less upfront configuration but demands careful driver version matching.

Step 1: Install Host Drivers Without Loading the Kernel Module

For LXC GPU passthrough, the host needs the NVIDIA drivers installed but you typically don't want the host using the GPU itself:

# On the Proxmox host — install drivers
apt install -y nvidia-driver-550 nvidia-utils-550

Verify the host sees the GPU

nvidia-smi

Step 2: Create the LXC Container

Create an unprivileged or privileged LXC container. For GPU access, privileged containers are simpler but less secure. Unprivileged containers require additional UID/GID mapping.

For a privileged container running Ollama:

# Create container (adjust template and storage as needed)
pct create 200 local:vztmpl/ubuntu-24.04-standard_24.04-2_amd64.tar.zst \
  --hostname ollama-lxc \
  --memory 16384 \
  --cores 8 \
  --rootfs local-lvm:32 \
  --net0 name=eth0,bridge=vmbr0,ip=dhcp \
  --unprivileged 0

Step 3: Configure the Container for GPU Access

Edit the container config to pass through GPU device nodes:

# /etc/pve/lxc/200.conf — add these lines
echo "lxc.cgroup2.devices.allow: c 195:* rwm" >> /etc/pve/lxc/200.conf
echo "lxc.cgroup2.devices.allow: c 234:* rwm" >> /etc/pve/lxc/200.conf
echo "lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file" >> /etc/pve/lxc/200.conf
echo "lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file" >> /etc/pve/lxc/200.conf
echo "lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file" >> /etc/pve/lxc/200.conf
echo "lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file" >> /etc/pve/lxc/200.conf

The cgroup device allow entries correspond to the major numbers for NVIDIA devices (195 for /dev/nvidia* and 234 for /dev/nvidia-uvm*). Verify your major numbers with ls -la /dev/nvidia*.

Step 4: Install Matching Drivers Inside the Container

This is where LXC passthrough gets tricky. The driver version inside the container must match the host exactly:

# Inside the container
apt update && apt install -y nvidia-driver-550 nvidia-utils-550

Test GPU visibility

nvidia-smi

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3.2

If nvidia-smi shows a version mismatch error, your container driver version doesn't match the host. Both sides must run identical versions.

Performance Comparison

The performance gap between the two approaches is smaller than most people expect. LXC containers have a slight edge on paper because they skip the hypervisor layer, but in practice, inference throughput on modern GPUs is nearly identical.

Tokens per second on RTX 3090 (Llama 3.2 7B Q4_K_M):

Method Tokens/sec VRAM Used Setup Time
KVM VM (VFIO) ~85 t/s 4.2 GB ~45 min
LXC container ~88 t/s 4.2 GB ~20 min
Bare metal ~90 t/s 4.2 GB

The ~3-5% difference between VM and LXC is negligible for most inference workloads. Where you feel the difference is in startup time and memory overhead — the VM needs RAM for the guest OS on top of your model VRAM requirements.

When to Choose a KVM VM

Full GPU passthrough in a VM is the right call when:

  • You need strong isolation. If your AI inference server is exposed to the network or runs untrusted model code, a VM's hardware-enforced boundary matters. LXC shares the host kernel — a kernel exploit in the container affects the whole system.
  • You're running Windows-only inference tools. Some AI frontends and fine-tuning tools only support Windows. GPU passthrough is your only option there.
  • You want to snapshot the entire inference environment. Proxmox VM snapshots capture everything including GPU state. LXC snapshots work too, but driver state is shared with the host.
  • Driver updates need to be independent. Upgrading NVIDIA drivers on a VM doesn't affect the host or other containers.
  • You're using consumer NVIDIA GPUs with code 43 concerns. VFIO passthrough with proper vendor ID hiding handles this correctly in a VM context.

When to Choose LXC

The LXC container approach wins when:

  • You want faster startup and less overhead. LXC containers start in under a second. A VM takes 15-30 seconds to boot, which matters if you're spinning inference environments up and down.
  • You have limited RAM. A VM needs 2-4 GB of RAM just for the guest OS. LXC uses only what your application actually needs. On a node with 32 GB, this difference is real.
  • You're sharing the GPU across multiple containers. Multiple LXC containers can access the same GPU simultaneously (driver-level sharing). VFIO passthrough gives one VM exclusive ownership — you can't split it without SR-IOV.
  • Your homelab is a trusted environment. If you're the only user and the system isn't exposed externally, the isolation tradeoff isn't worth the overhead.
  • You already have a working driver setup on the host. LXC GPU passthrough leverages your existing host driver install. There's no IOMMU group archaeology required.

Multi-Container GPU Sharing

One genuinely compelling LXC use case: running multiple AI services on a single GPU. You can have Ollama in one container, Stable Diffusion in another, and a transcription service in a third — all sharing the same physical GPU at the kernel level.

# Three containers sharing one RTX 3090
# Container 200: Ollama (LLM inference)
# Container 201: Automatic1111 (image generation)
# Container 202: Whisper (audio transcription)

Each gets the same device passthrough config

GPU scheduling is handled by the NVIDIA driver

With a VM, you'd need three separate GPUs or SR-IOV support (rare on consumer hardware). LXC makes GPU sharing trivial.

Troubleshooting Common Issues

VM: "No devices found" after passthrough Verify IOMMU is actually enabled: dmesg | grep -i iommu. If you see nothing, the BIOS setting didn't apply. Double-check AMD-Vi/VT-d and recheck your grub cmdline.

LXC: nvidia-smi shows driver version mismatch The host and container driver versions must be identical. Run nvidia-smi on the host to get the exact version, then install that exact version inside the container.

LXC: Permission denied on /dev/nvidia0 Verify your cgroup2 allow entries match the actual major device numbers: ls -la /dev/nvidia* | awk '{print $5}' | sort -u. Update /etc/pve/lxc/200.conf with the correct values.

Both: Ollama not using GPU Check ollama logs for CUDA initialization errors. Set CUDA_VISIBLE_DEVICES=0 explicitly if you have multiple GPUs. For LXC, verify /dev/nvidia-uvm is accessible — this is required for CUDA.

Conclusion

For most homelab AI inference setups, LXC containers are the pragmatic choice. They're faster to set up, use less RAM, support GPU sharing across workloads, and perform within a few percent of bare metal. The setup complexity — mainly the driver version matching requirement — is manageable once you understand it.

Choose a KVM VM when isolation matters, when you're running Windows-only tools, or when you need independent driver lifecycle management. Full GPU passthrough delivers excellent performance and a clean separation of concerns that LXC simply can't match.

If you're unsure, start with LXC. The lxc.mount.entry config lines feel arcane the first time, but the runtime experience is seamless. You can always migrate to a VM later if your requirements change — your Ollama models and config transfer cleanly either way.

Share
Proxmox Pulse

Written by

Proxmox Pulse

Sysadmin-driven guides for getting the most out of Proxmox VE in production and homelab environments.

Related Articles

View all →