Running Qwen3.6 35B Locally with Ollama and VS Code Integration

Overview Running large language models locally is becoming increasingly practical, even for developers without access to massive GPU clusters. In this post, I walk through how to: Run Qwen3.6 35B (A3B, Q4_K_M quantized) locally using Ollama Integrate the model into VS Code Use it as a local coding assistant This setup is especially useful for: Air-gapped environments

Posts

setup vllm on macbook m4

Introduction Several days ago, I setup ollama on my MacBook M4, and it works pretty well. At that time, I tried to use it with copilt with local models codegemma:7b and qwen3:8b. My expectation was not so high as the hardware configuration of my macbook pro m4 is just a entry level, just want to see how it works. I also learned there are other options such as vllm. After comparing the two, I found vllm is more flexible, powerful, product-ready, used widely in enterprises.