Run Ollama on Proxmox LXC with AMD GPU Access

Expose an AMD GPU to a Proxmox LXC container for local AI inference with Ollama—no full VM passthrough required. Step-by-step guide for homelabs.

Proxmox Pulse Proxmox Pulse
11 min read
Glowing AMD GPU inside a server rack surrounded by floating holographic LXC containers and neural network data streams.

Local AI inference has gone from niche experiment to homelab staple in the span of about two years. If you're running Proxmox VE, you've probably already got a capable GPU sitting in your server, and you might be wondering whether you really need to dedicate a full VM just to run Ollama. The answer is no—you can expose an AMD GPU directly to an unprivileged (or lightly privileged) LXC container, keep your overhead minimal, and still get full ROCm-accelerated inference. This guide walks you through exactly how to do it.

Why LXC Instead of a VM?

The short answer is efficiency. A KVM virtual machine gives you strong isolation and full GPU passthrough via VFIO, but it comes at a cost: you're emulating a complete hardware stack, maintaining a separate OS install, and dealing with the complexity of VFIO binding and PCIe passthrough configuration.

An LXC container shares the host kernel directly. This means device access is simpler—you expose /dev/dri and /dev/kfd directly into the container without any IOMMU gymnastics. You also get lower memory overhead, faster startup times, and easier snapshots.

The tradeoff is isolation. LXC containers share the host kernel, so a kernel exploit could theoretically escape the container. For a homelab running inference workloads on a trusted network, this is an acceptable tradeoff. For production multi-tenant environments, stick to VMs.

Prerequisites

Before you start, make sure you have the following in place:

  • Proxmox VE 8.2 or newer (PVE 9 works great here)
  • An AMD GPU with ROCm support (RX 6000 series, RX 7000 series, Instinct cards)
  • The amdgpu kernel module loaded on the host
  • At least 16 GB of RAM recommended for running mid-size models
  • A Debian 12 or Ubuntu 24.04 LXC template downloaded

You can verify the AMD GPU is visible on the host with:

lspci | grep -i amd
ls /dev/dri/
ls /dev/kfd

You should see renderD128 (and possibly card0) under /dev/dri, and /dev/kfd for ROCm compute access. If /dev/kfd is missing, the amdgpu module may not be loaded with the right parameters.

Loading amdgpu with ROCm Support

On some systems, the amdgpu driver loads without compute support enabled by default. Check with:

cat /sys/module/amdgpu/parameters/gpu_recovery
dmesg | grep amdgpu | grep -i kfd

If KFD isn't initializing, add the module parameter:

echo 'options amdgpu ip_block_mask=0xff' >> /etc/modprobe.d/amdgpu.conf
update-initramfs -u
reboot

For most consumer GPUs (RDNA 2/3), the default parameters work fine and KFD loads automatically.

Creating the LXC Container

You can create the container via the Proxmox web UI or via pct on the command line. I'll use the CLI since it's faster to document and reproduce.

Download the Template

From the Proxmox shell, pull a Debian 12 template if you haven't already:

pveam update
pveam download local debian-12-standard_12.7-1_amd64.tar.zst

Substitute ubuntu-24.04-standard if you prefer Ubuntu.

Create the Container

pct create 200 local:vztmpl/debian-12-standard_12.7-1_amd64.tar.zst \
  --hostname ollama \
  --cores 4 \
  --memory 8192 \
  --swap 2048 \
  --rootfs local-lvm:32 \
  --net0 name=eth0,bridge=vmbr0,ip=dhcp \
  --unprivileged 1 \
  --features nesting=1 \
  --password changeme

Adjust container ID (200), storage pool (local-lvm), and bridge (vmbr0) to match your setup. The --unprivileged 1 flag is important—we want to keep the container unprivileged where possible. We'll handle the device permissions manually.

Exposing the AMD GPU to the Container

This is the key part. We need to pass /dev/dri/renderD128 and /dev/kfd into the container. Proxmox LXC supports raw device passthrough via the container config file.

Get Device IDs

First, find the major:minor numbers for your GPU devices:

ls -la /dev/dri/renderD128
ls -la /dev/kfd

Example output:

crw-rw---- 1 root render 226, 128 Mar 20 10:00 /dev/dri/renderD128 crw-rw---- 1 root render 235, 0 Mar 20 10:00 /dev/kfd

Here 226:128 is renderD128 and 235:0 is kfd. These numbers matter for the cgroup device allow list.

Edit the Container Config

Open the container config directly:

nano /etc/pve/lxc/200.conf

Add the following lines at the bottom:

# AMD GPU device passthrough
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 235:0 rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file

The 226:0 entry covers /dev/dri/card0 (display output), 226:128 covers the render node, and 235:0 covers the KFD compute interface. Mounting the entire /dev/dri directory is convenient if you have multiple render nodes.

Fix Group Permissions

ROCm requires the container's user to be in the render and video groups. In an unprivileged container, group IDs are mapped. The host render group (GID 107 on Debian) maps to a different GID inside the container.

Check the host render GID:

getent group render
# render:x:107:

In an unprivileged container with the default UID/GID mapping (host offset 100000), GID 107 on the host maps to GID 100107 inside the container—which doesn't correspond to any named group inside. The simplest fix is to add a render group inside the container with the correct mapped GID, or to use a slightly privileged setup.

For homelabs, the pragmatic approach is to add this to the container config:

ini lxc.idmap: u 0 100000 65536 lxc.idmap: g 0 100000 65536

This is the default unprivileged mapping. To allow render device access without going fully privileged, you can either:

Option A: Add the host render group to the device and set the container to run with --unprivileged 0 (privileged). This is simpler and perfectly acceptable for a single-user homelab.

Option B: Use lxc.hook.pre-start to chmod the devices before the container starts (requires a host-side script).

For most homelab users, Option A is the path of least resistance:

# In /etc/pve/lxc/200.conf, change or ensure:
# unprivileged: 0

Or via the CLI:

pct set 200 --unprivileged 0

Then start the container:

pct start 200

Installing ROCm and Ollama in the Container

Shell into the container:

pct enter 200

Install ROCm

AMD's ROCm stack is the runtime Ollama uses for GPU acceleration on AMD hardware. Install it via AMD's official repo:

apt update && apt install -y curl gnupg2 wget

Add AMD ROCm repository

wget https://repo.radeon.com/amdgpu-install/6.3/ubuntu/noble/amdgpu-install_6.3.60300-1_all.deb dpkg -i amdgpu-install_6.3.60300-1_all.deb amdgpu-install --usecase=rocm --no-dkms

The --no-dkms flag skips kernel module installation—we don't need that inside the container since we're using the host kernel's amdgpu driver. This significantly speeds up the install.

Verify ROCm sees the GPU:

bash rocminfo | grep -A5 'Agent 2'

You should see your GPU listed as an HSA agent with compute capability details.

Also check with:

bash rocm-smi

If rocm-smi shows your GPU with memory and temperature data, you're in good shape.

Add User to Groups

Make sure your user (or root) is in the render and video groups:

usermod -aG render,video root

If you're running Ollama as a dedicated service user, create one and add them to those groups instead.

Install Ollama

Ollama has an official install script that auto-detects ROCm:

curl -fsSL https://ollama.com/install.sh | sh

Ollama will detect the AMD GPU via ROCm and configure itself accordingly. After install, verify the service is running:

systemctl status ollama

And test GPU detection:

ollama run llama3.2:3b

Watch GPU memory usage on the host in a separate terminal:

rocm-smi --showmeminfo vram

If VRAM usage jumps when you run the model, the GPU is being used.

Configuring Ollama as a Service

By default, Ollama only listens on 127.0.0.1:11434. To access it from other machines on your network (or from Open WebUI running in another container), you need to bind it to 0.0.0.0.

Edit the systemd service override:

systemctl edit ollama

Add:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_KEEP_ALIVE=5m"

Save and reload:

systemctl daemon-reload
systemctl restart ollama

OLLAMA_KEEP_ALIVE controls how long models stay loaded in VRAM between requests. 5m is a good balance between responsiveness and VRAM conservation.

Useful Ollama Environment Variables

Variable Description Example
OLLAMA_HOST Bind address and port 0.0.0.0:11434
OLLAMA_KEEP_ALIVE Model idle timeout 5m, 0 (always), -1 (never unload)
OLLAMA_MAX_LOADED_MODELS Concurrent models in VRAM 2
OLLAMA_NUM_PARALLEL Parallel request handling 4
OLLAMA_MODELS Custom model storage path /data/ollama/models

Pulling and Running Models

With Ollama running and the GPU accessible, pull a few models to test:

# Lightweight, fast
ollama pull llama3.2:3b

Mid-size, great for coding

ollama pull qwen2.5-coder:7b

Reasoning model

ollama pull deepseek-r1:8b

For a quick benchmark to confirm GPU acceleration:

ollama run llama3.2:3b "Write a haiku about Proxmox."

GPU-accelerated inference on an RX 7900 XTX should yield 50-100+ tokens per second for a 3B model. If you're seeing 5-10 tokens per second, you're likely on CPU—double-check the ROCm setup and group permissions.

Connecting Open WebUI

Open WebUI is the most popular frontend for Ollama. You can run it in a separate LXC container or directly on the same one. A separate container keeps things cleaner.

Create a minimal Ubuntu container, install Docker or run Open WebUI directly:

docker run -d \
  --name open-webui \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://192.168.1.50:11434 \
  -v open-webui:/app/backend/data \
  --restart unless-stopped \
  ghcr.io/open-webui/open-webui:main

Replace 192.168.1.50 with your Ollama container's IP. You can find it with pct exec 200 -- ip addr show eth0.

Troubleshooting Common Issues

/dev/kfd: Permission Denied

This is the most common issue. Inside the container, run:

ls -la /dev/kfd

If it shows root:root with mode 0600, the render group mapping isn't working. Quick fix:

# From host
chmod 666 /dev/kfd

For a persistent fix, add a udev rule on the host:

echo 'KERNEL=="kfd", GROUP="render", MODE="0666"' > /etc/udev/rules.d/70-kfd.rules
udevadm control --reload-rules && udevadm trigger

rocminfo Shows No GPU Agent

If rocminfo only shows CPU agents, the GPU isn't accessible. Check:

# Inside container
ls -la /dev/dri/
# Should show renderD128

Also check

dmesg | grep -i kfd

If /dev/dri/renderD128 is missing inside the container, the mount entry in the LXC config didn't apply. Restart the container after editing the config and verify with pct config 200.

Ollama Uses CPU Despite GPU Being Present

Check Ollama logs:

journalctl -u ollama -f

Look for lines mentioning ROCm initialization. If you see "no ROCm GPU detected", Ollama's bundled ROCm library may not match your GPU's gfx architecture. Set the override:

# In /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"

Replace 11.0.0 with your GPU's gfx version (check with rocminfo | grep gfx). This is commonly needed for RDNA 3 cards (gfx1100, gfx1101, gfx1102).

Container Fails to Start After Config Changes

If the container won't start after editing the config, check for syntax errors:

pct config 200
pct start 200 --debug

Missing device nodes on the host (e.g., /dev/kfd doesn't exist) will cause the lxc.mount.entry to fail silently or hard-fail depending on the optional flag. Double-check the host devices exist before starting.

Performance Expectations

Here's a rough guide for what to expect on common AMD GPUs:

GPU VRAM ~Tokens/sec (7B model)
RX 6600 8 GB 25–40 t/s
RX 6700 XT 12 GB 35–55 t/s
RX 7900 GRE 16 GB 60–85 t/s
RX 7900 XTX 24 GB 80–110 t/s
Radeon PRO W7800 32 GB 90–120 t/s

These are approximate figures with default Ollama settings and Q4_K_M quantization. Performance varies by model architecture, context length, and concurrent load.

Conclusion

Running Ollama in a Proxmox LXC container with AMD GPU access is one of the most efficient ways to add local AI inference to a homelab. You skip the overhead of a full VM, avoid the complexity of VFIO passthrough, and still get hardware-accelerated inference through ROCm.

The main gotcha is device permissions—getting /dev/kfd and /dev/dri/renderD128 accessible inside the container with the right group permissions. Once that's sorted, ROCm and Ollama install cleanly and the GPU is fully utilized.

From here, you can expand the setup: add Open WebUI for a ChatGPT-like interface, experiment with function calling via tools like LiteLLM, or set up model routing across multiple containers. The lightweight nature of LXC makes it easy to run Ollama alongside your other Proxmox workloads without dedicating a full machine to AI inference.

Share
Proxmox Pulse

Written by

Proxmox Pulse

Sysadmin-driven guides for getting the most out of Proxmox VE in production and homelab environments.

Related Articles

View all →