1 article tagged with this topic
A llama.cpp fork with turbo3 KV cache quantization achieves ~40 tok/s on Qwen3-35 B-A3B with only 12GB VRAM.