Back to home
TurboQuant
2 articles tagged with this topic
TurboQuantKV Cache
Independent KV Cache Evaluation SDK Signals Shift to Inference Infrastructure
KV cache dominates VRAM in long-context inference. An independent evaluation SDK for TurboQuant signals the shift from "can it run?" to "how to run st
May 52 min read
llama.cppQwen3
GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx
A llama.cpp fork with turbo3 KV cache quantization achieves ~40 tok/s on Qwen3-35 B-A3B with only 12GB VRAM.
Apr 163 min read