Running vLLM on AMD AI MAX+ 395 (ROCm, Ubuntu 24.04)
- 3 minutes read - 563 wordsFinally, I managed to get vLLM running on my AMD AI MAX+ 395 GPU on Ubuntu 24.04.
It was not straightforward — ROCm support on Ryzen AI (gfx1151) is still evolving, and I ran into multiple low-level GPU faults before finding a stable setup.
This post documents: - What didn’t work - The errors I encountered - The working configuration
Hopefully this saves you a few hours (or days).
✅ Final Working Setup
Instead of fighting local installs, I used AMD’s official ROCm + vLLM Docker image.
docker run -it --rm \
--network=host \
--group-add=video \
--ipc=host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v ~/.cache/huggingface:/app/models \
-e HF_HOME="/app/models" \
rocm/vllm-dev:rocm7.2_navi_ubuntu24.04_py3.12_pytorch_2.9_vllm_0.14.0rc0
Inside the container:
vllm serve "Qwen/Qwen2.5-1.5B-Instruct" \
--dtype float16 \
--max-model-len 8192 \
--gpu-memory-utilization 0.4
This combination finally worked reliably.
Key points: - ROCm 7.2 base image - Python 3.12 + PyTorch 2.9 (preconfigured) - FP16 instead of BF16 - Conservative GPU memory usage (0.4)
❌ What Didn’t Work
Before reaching this setup, I tried both ROCm 7.2.0 and 7.1.1 manually.
Both failed — but in different ways.
🚨 Issue 1: ROCm 7.2.0 GPU Page Fault
With ROCm 7.2.0, vLLM crashed during initialization with a GPU memory access fault.
Error:
Memory access fault by GPU node-1 Reason: Page not present or supervisor privilege
dmesg showed:
[ 131.104308] amdgpu: [gfxhub] page fault [ 131.104330] Process VLLM::EngineCor [ 131.104334] GCVM_L2_PROTECTION_FAULT_STATUS:0x00800932 [ 131.104341] PERMISSION_FAULTS: 0x3 [ 131.104342] MAPPING_ERROR: 0x1
This indicates: - GPU virtual memory mapping failure - Likely invalid or unmapped memory access - Happens inside ROCm/HSA layer, not user code
In short: hard crash at the driver/runtime level
🚨 Issue 2: ROCm 7.1.1 Silent Segfault
Downgrading to ROCm 7.1.1 did not fix the issue.
Instead, I got a silent crash:
VLLM::EngineCor: segfault in libhsa-runtime64.so
No useful logs from vLLM itself — only kernel-level signals.
This is worse:
- No clear error message
- Happens inside libhsa-runtime64
- Suggests runtime incompatibility with gfx1151
🤔 Observations
From these failures, a few patterns became clear:
-
ROCm on Ryzen AI (gfx1151) is still maturing
-
Errors often occur in:
-
HSA runtime (
libhsa-runtime64) -
GPU VM (gfxhub faults)
-
-
Different ROCm versions fail differently:
-
7.2.0 → explicit GPU page fault
-
7.1.1 → silent segfault
-
This strongly suggests: - Not a vLLM issue - Not a model issue - But a ROCm + driver + hardware interaction problem
💡 Why Docker Worked
The AMD-provided Docker image solved the problem because it gives:
-
Matching versions:
-
ROCm
-
PyTorch
-
vLLM
-
Correct build flags for gfx1151
-
Pre-tested environment
This avoids: - ABI mismatches - Incorrect HIP/HSA configurations - Python environment inconsistencies
🧠 Lessons Learned
If you’re running vLLM on AMD AI MAX (or similar APU):
-
Don’t mix ROCm versions manually
-
Prefer official AMD containers
-
Start with:
-
small models (1B–7B)
-
FP16
-
low memory utilization
-
-
If you see:
-
gfxhub page fault -
libhsa-runtime64.so segfault
-
👉 it’s very likely not your code
🚀 What’s Next
Now that vLLM is working: - Try larger models - Experiment with quantization (AWQ / GPTQ) - Benchmark vs CUDA setups
But for now — I’m just happy it runs 😄
📌 Summary
Component |
Version |
OS |
Ubuntu 24.04 |
GPU |
AMD AI MAX+ 395 (gfx1151) |
Runtime |
ROCm 7.2 (Docker) |
Python |
3.12 |
PyTorch |
2.9 |
vLLM |
0.14.0rc0 |
Model |
Qwen2.5-1.5B-Instruct |
If you’re working on similar hardware, feel free to reach out — this space is still moving fast, and shared knowledge helps a lot.