skip to content
Guil Sa

Getting Started with Local LLMs (Qwen3.6) on Your Mac

/ 3 min read

As of Q2 2026, all you need is 48GB of RAM for heavy work + frontier local LLM that’s very fast (69 tokens/sec is an incredible rate).

Install pi agent harness, then buy $5 of OpenRouter credits, inside pi run /login to configure your OpenRouter key, then pick GLM 5.2 or DeepSeek V4 Flash, then ask pi to help you download Qwen3.6-35B-A3B-UD-Q5_K_XL from HuggingFace (Google it yourself first and understand why HuggingFace is a trusted repository).

Why pi, you may ask? I don’t really like heroes. But yeah, Mario Zechner is my new hero.

Qwen3.6 35B A3B is a marvel of an invention. It’s a mixture of experts model and is much faster than its cousin, the dense model Qwen3.6 27B.

The Q5 means 5-bit quantization, which you don’t need to understand in depth, but it’s better than 4-bit quant, which is already ideal and what experts run.

Once pi downloads the model files, tell it to add the model to pi’s own models.json. Then tell pi to install llama.cpp via homebrew and tell it to teach you how to start/stop the llama.cpp server from the command line.

You’re ready to swap out OpenRouter for your local Qwen3.6 instance. Type /model + press enter. Say hi to it, test basic operations and work with pi to tweak things based on your needs.

Stay away from pi extensions. It helps to know about info stealers (1, 2) to grasp what’s at stake if you ever get hacked. Nowadays, anyone is a target. For example, there’s usually no real need to install web search pi extensions. If your model needs to search the internet, instead of doing that, tell it to download data from places that you trust (otherwise you shouldn’t download it, right?) to your /tmp directory (just tell pi this). Have a conversation with pi about what you pulled from the internet like that. Reason being, unsanitized incoming data is dangerous and it’s how you can get hacked. Don’t get hacked. I have personally seen my agent (using other setups and tools) visit random websites unrelated to my requests. You risk being targeted with a prompt injection. If you really need your local agent to research the web, isolate pi (not the llm server) inside a Docker VM (but try to skip that, don’t be so anxious!).

Good luck and have fun with your Macbook Pro M5 Pro. It’s an amazing machine. Btw, if you want to know why you shouldn’t buy 64GB of RAM (it will not make a difference), that’ll be in my next post.

I’m job hunting btw. If you need nice people on your team, msg me.