Below you will find pages that utilize the taxonomy term “llama”
Posts
Running Qwen3.6 MTP GGUF on AMD AI MAX 395 with llama.cpp ROCm
Background MTP support was recently merged into llama.cpp through the following pull request:
MTP support merged into llama.cpp
After the merge, I wanted to test MTP models on my mini-PC powered by the AMD AI MAX 395. I tried several approaches, including manually building llama.cpp and using Unsloth GGUF models directly. However, despite multiple attempts, I could not get a stable working setup.
I also searched through GitHub issues and asked several AI assistants, including ChatGPT, Gemini, and DeepSeek.