Recent Posts
Running Qwen3.6 35B Locally with Ollama and VS Code Integration
Overview Running large language models locally is becoming increasingly practical, even for developers without access to massive GPU clusters.
In this post, I walk through how to:
Run Qwen3.6 35B (A3B, Q4_K_M quantized) locally using Ollama
Integrate the model into VS Code
Use it as a local coding assistant
This setup is especially useful for:
Air-gapped environments
read more
Running vLLM on AMD AI MAX+ 395 (ROCm, Ubuntu 24.04)
Finally, I managed to get vLLM running on my AMD AI MAX+ 395 GPU on Ubuntu 24.04.
It was not straightforward — ROCm support on Ryzen AI (gfx1151) is still evolving, and I ran into multiple low-level GPU faults before finding a stable setup.
This post documents: - What didn’t work - The errors I encountered - The working configuration
Hopefully this saves you a few hours (or days).
read more
Can Claude Code Use GitHub Copilot as a Backend? A Practical Exploration
Introduction Recently, I’ve been experimenting with a variety of LLM tooling ecosystems, including:
Claude Code
Codex via OpenRouter
Ollama
LM Studio
vLLM
LiteLLM
My goal is to better understand the underlying technologies and explore how to operate these tools in air-gapped or controlled environments.
In many enterprise settings, developers are allowed to use GitHub Copilot, but not Claude Code.
read more