Run Ollama on Proxmox LXC with AMD GPU Access
Expose an AMD GPU to a Proxmox LXC container for local AI inference with Ollama—no full VM passthrough required. Step-by-step guide for homelabs.
On this page
Local AI inference has gone from niche experiment to homelab staple in the span of about two years. If you're running Proxmox VE, you've probably already got a capable GPU sitting in your server, and you might be wondering whether you really need to dedicate a full VM just to run Ollama. The answer is no—you can expose an AMD GPU directly to an unprivileged (or lightly privileged) LXC container, keep your overhead minimal, and still get full ROCm-accelerated inference. This guide walks you through exactly how to do it.
Why LXC Instead of a VM?
The short answer is efficiency. A KVM virtual machine gives you strong isolation and full GPU passthrough via VFIO, but it comes at a cost: you're emulating a complete hardware stack, maintaining a separate OS install, and dealing with the complexity of VFIO binding and PCIe passthrough configuration.
An LXC container shares the host kernel directly. This means device access is simpler—you expose /dev/dri and /dev/kfd directly into the container without any IOMMU gymnastics. You also get lower memory overhead, faster startup times, and easier snapshots.
The tradeoff is isolation. LXC containers share the host kernel, so a kernel exploit could theoretically escape the container. For a homelab running inference workloads on a trusted network, this is an acceptable tradeoff. For production multi-tenant environments, stick to VMs.
Prerequisites
Before you start, make sure you have the following in place:
- Proxmox VE 8.2 or newer (PVE 9 works great here)
- An AMD GPU with ROCm support (RX 6000 series, RX 7000 series, Instinct cards)
- The
amdgpukernel module loaded on the host - At least 16 GB of RAM recommended for running mid-size models
- A Debian 12 or Ubuntu 24.04 LXC template downloaded
You can verify the AMD GPU is visible on the host with:
lspci | grep -i amd
ls /dev/dri/
ls /dev/kfd
You should see renderD128 (and possibly card0) under /dev/dri, and /dev/kfd for ROCm compute access. If /dev/kfd is missing, the amdgpu module may not be loaded with the right parameters.
Loading amdgpu with ROCm Support
On some systems, the amdgpu driver loads without compute support enabled by default. Check with:
cat /sys/module/amdgpu/parameters/gpu_recovery
dmesg | grep amdgpu | grep -i kfd
If KFD isn't initializing, add the module parameter:
echo 'options amdgpu ip_block_mask=0xff' >> /etc/modprobe.d/amdgpu.conf
update-initramfs -u
reboot
For most consumer GPUs (RDNA 2/3), the default parameters work fine and KFD loads automatically.
Creating the LXC Container
You can create the container via the Proxmox web UI or via pct on the command line. I'll use the CLI since it's faster to document and reproduce.
Download the Template
From the Proxmox shell, pull a Debian 12 template if you haven't already:
pveam update
pveam download local debian-12-standard_12.7-1_amd64.tar.zst
Substitute ubuntu-24.04-standard if you prefer Ubuntu.
Create the Container
pct create 200 local:vztmpl/debian-12-standard_12.7-1_amd64.tar.zst \
--hostname ollama \
--cores 4 \
--memory 8192 \
--swap 2048 \
--rootfs local-lvm:32 \
--net0 name=eth0,bridge=vmbr0,ip=dhcp \
--unprivileged 1 \
--features nesting=1 \
--password changeme
Adjust container ID (200), storage pool (local-lvm), and bridge (vmbr0) to match your setup. The --unprivileged 1 flag is important—we want to keep the container unprivileged where possible. We'll handle the device permissions manually.
Exposing the AMD GPU to the Container
This is the key part. We need to pass /dev/dri/renderD128 and /dev/kfd into the container. Proxmox LXC supports raw device passthrough via the container config file.
Get Device IDs
First, find the major:minor numbers for your GPU devices:
ls -la /dev/dri/renderD128
ls -la /dev/kfd
Example output:
crw-rw---- 1 root render 226, 128 Mar 20 10:00 /dev/dri/renderD128 crw-rw---- 1 root render 235, 0 Mar 20 10:00 /dev/kfd
Here 226:128 is renderD128 and 235:0 is kfd. These numbers matter for the cgroup device allow list.
Edit the Container Config
Open the container config directly:
nano /etc/pve/lxc/200.conf
Add the following lines at the bottom:
# AMD GPU device passthrough
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 235:0 rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
The 226:0 entry covers /dev/dri/card0 (display output), 226:128 covers the render node, and 235:0 covers the KFD compute interface. Mounting the entire /dev/dri directory is convenient if you have multiple render nodes.
Fix Group Permissions
ROCm requires the container's user to be in the render and video groups. In an unprivileged container, group IDs are mapped. The host render group (GID 107 on Debian) maps to a different GID inside the container.
Check the host render GID:
getent group render
# render:x:107:
In an unprivileged container with the default UID/GID mapping (host offset 100000), GID 107 on the host maps to GID 100107 inside the container—which doesn't correspond to any named group inside. The simplest fix is to add a render group inside the container with the correct mapped GID, or to use a slightly privileged setup.
For homelabs, the pragmatic approach is to add this to the container config:
ini lxc.idmap: u 0 100000 65536 lxc.idmap: g 0 100000 65536
This is the default unprivileged mapping. To allow render device access without going fully privileged, you can either:
Option A: Add the host render group to the device and set the container to run with --unprivileged 0 (privileged). This is simpler and perfectly acceptable for a single-user homelab.
Option B: Use lxc.hook.pre-start to chmod the devices before the container starts (requires a host-side script).
For most homelab users, Option A is the path of least resistance:
# In /etc/pve/lxc/200.conf, change or ensure:
# unprivileged: 0
Or via the CLI:
pct set 200 --unprivileged 0
Then start the container:
pct start 200
Installing ROCm and Ollama in the Container
Shell into the container:
pct enter 200
Install ROCm
AMD's ROCm stack is the runtime Ollama uses for GPU acceleration on AMD hardware. Install it via AMD's official repo:
apt update && apt install -y curl gnupg2 wget
Add AMD ROCm repository
wget https://repo.radeon.com/amdgpu-install/6.3/ubuntu/noble/amdgpu-install_6.3.60300-1_all.deb dpkg -i amdgpu-install_6.3.60300-1_all.deb amdgpu-install --usecase=rocm --no-dkms
The --no-dkms flag skips kernel module installation—we don't need that inside the container since we're using the host kernel's amdgpu driver. This significantly speeds up the install.
Verify ROCm sees the GPU:
bash rocminfo | grep -A5 'Agent 2'
You should see your GPU listed as an HSA agent with compute capability details.
Also check with:
bash rocm-smi
If rocm-smi shows your GPU with memory and temperature data, you're in good shape.
Add User to Groups
Make sure your user (or root) is in the render and video groups:
usermod -aG render,video root
If you're running Ollama as a dedicated service user, create one and add them to those groups instead.
Install Ollama
Ollama has an official install script that auto-detects ROCm:
curl -fsSL https://ollama.com/install.sh | sh
Ollama will detect the AMD GPU via ROCm and configure itself accordingly. After install, verify the service is running:
systemctl status ollama
And test GPU detection:
ollama run llama3.2:3b
Watch GPU memory usage on the host in a separate terminal:
rocm-smi --showmeminfo vram
If VRAM usage jumps when you run the model, the GPU is being used.
Configuring Ollama as a Service
By default, Ollama only listens on 127.0.0.1:11434. To access it from other machines on your network (or from Open WebUI running in another container), you need to bind it to 0.0.0.0.
Edit the systemd service override:
systemctl edit ollama
Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_KEEP_ALIVE=5m"
Save and reload:
systemctl daemon-reload
systemctl restart ollama
OLLAMA_KEEP_ALIVE controls how long models stay loaded in VRAM between requests. 5m is a good balance between responsiveness and VRAM conservation.
Useful Ollama Environment Variables
| Variable | Description | Example |
|---|---|---|
OLLAMA_HOST |
Bind address and port | 0.0.0.0:11434 |
OLLAMA_KEEP_ALIVE |
Model idle timeout | 5m, 0 (always), -1 (never unload) |
OLLAMA_MAX_LOADED_MODELS |
Concurrent models in VRAM | 2 |
OLLAMA_NUM_PARALLEL |
Parallel request handling | 4 |
OLLAMA_MODELS |
Custom model storage path | /data/ollama/models |
Pulling and Running Models
With Ollama running and the GPU accessible, pull a few models to test:
# Lightweight, fast
ollama pull llama3.2:3b
Mid-size, great for coding
ollama pull qwen2.5-coder:7b
Reasoning model
ollama pull deepseek-r1:8b
For a quick benchmark to confirm GPU acceleration:
ollama run llama3.2:3b "Write a haiku about Proxmox."
GPU-accelerated inference on an RX 7900 XTX should yield 50-100+ tokens per second for a 3B model. If you're seeing 5-10 tokens per second, you're likely on CPU—double-check the ROCm setup and group permissions.
Connecting Open WebUI
Open WebUI is the most popular frontend for Ollama. You can run it in a separate LXC container or directly on the same one. A separate container keeps things cleaner.
Create a minimal Ubuntu container, install Docker or run Open WebUI directly:
docker run -d \
--name open-webui \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://192.168.1.50:11434 \
-v open-webui:/app/backend/data \
--restart unless-stopped \
ghcr.io/open-webui/open-webui:main
Replace 192.168.1.50 with your Ollama container's IP. You can find it with pct exec 200 -- ip addr show eth0.
Troubleshooting Common Issues
/dev/kfd: Permission Denied
This is the most common issue. Inside the container, run:
ls -la /dev/kfd
If it shows root:root with mode 0600, the render group mapping isn't working. Quick fix:
# From host
chmod 666 /dev/kfd
For a persistent fix, add a udev rule on the host:
echo 'KERNEL=="kfd", GROUP="render", MODE="0666"' > /etc/udev/rules.d/70-kfd.rules
udevadm control --reload-rules && udevadm trigger
rocminfo Shows No GPU Agent
If rocminfo only shows CPU agents, the GPU isn't accessible. Check:
# Inside container
ls -la /dev/dri/
# Should show renderD128
Also check
dmesg | grep -i kfd
If /dev/dri/renderD128 is missing inside the container, the mount entry in the LXC config didn't apply. Restart the container after editing the config and verify with pct config 200.
Ollama Uses CPU Despite GPU Being Present
Check Ollama logs:
journalctl -u ollama -f
Look for lines mentioning ROCm initialization. If you see "no ROCm GPU detected", Ollama's bundled ROCm library may not match your GPU's gfx architecture. Set the override:
# In /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"
Replace 11.0.0 with your GPU's gfx version (check with rocminfo | grep gfx). This is commonly needed for RDNA 3 cards (gfx1100, gfx1101, gfx1102).
Container Fails to Start After Config Changes
If the container won't start after editing the config, check for syntax errors:
pct config 200
pct start 200 --debug
Missing device nodes on the host (e.g., /dev/kfd doesn't exist) will cause the lxc.mount.entry to fail silently or hard-fail depending on the optional flag. Double-check the host devices exist before starting.
Performance Expectations
Here's a rough guide for what to expect on common AMD GPUs:
| GPU | VRAM | ~Tokens/sec (7B model) |
|---|---|---|
| RX 6600 | 8 GB | 25–40 t/s |
| RX 6700 XT | 12 GB | 35–55 t/s |
| RX 7900 GRE | 16 GB | 60–85 t/s |
| RX 7900 XTX | 24 GB | 80–110 t/s |
| Radeon PRO W7800 | 32 GB | 90–120 t/s |
These are approximate figures with default Ollama settings and Q4_K_M quantization. Performance varies by model architecture, context length, and concurrent load.
Conclusion
Running Ollama in a Proxmox LXC container with AMD GPU access is one of the most efficient ways to add local AI inference to a homelab. You skip the overhead of a full VM, avoid the complexity of VFIO passthrough, and still get hardware-accelerated inference through ROCm.
The main gotcha is device permissions—getting /dev/kfd and /dev/dri/renderD128 accessible inside the container with the right group permissions. Once that's sorted, ROCm and Ollama install cleanly and the GPU is fully utilized.
From here, you can expand the setup: add Open WebUI for a ChatGPT-like interface, experiment with function calling via tools like LiteLLM, or set up model routing across multiple containers. The lightweight nature of LXC makes it easy to run Ollama alongside your other Proxmox workloads without dedicating a full machine to AI inference.