TurboQuant

3 articles tagged with this topic

GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx

A llama.cpp fork with turbo3 KV cache quantization achieves ~40 tok/s on Qwen3-35 B-A3B with only 12GB VRAM.

Apr 163 min read

llama.cppTurboQuant

TurboQuant KV Cache Quantization Beats Baselines on Gemma 4 and Qwen

Community benchmarks show TurboQuant KV quantization achieves near-zero accuracy loss at 3.1 bits on Gemma 4 with 34% long-context speedup.

Apr 52 min read

TurboQuantVector Quantization

TurboQuant and Vector Quantization: A Beginner's Breakdown

A Reddit user unpacks Google's TurboQuant blog from first principles, making LLM quantization accessible without heavy prerequisites.

Apr 51 min read