返回首页
TurboQuant
找到 3 篇关于此标签的文章
llama.cppQwen3
GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx
ll ama.cpp 的 turboquant 分支通过 turbo3 KV cache 量化, 在单张 RTX 3080 12GB 显存上实现 Qwen3-35B-A3B 约 40 tok/s 推理速度,并支持 260k 上下文窗口。
Apr 161 分钟
llama.cppTurboQuant
TurboQuant KV Cache Quantization Beats Baselines on Gemma 4 and Qwen
Community benchmarks show TurboQuant KV quantization achieves near-zero accuracy loss at 3.1 bits on Gemma 4 with 34% long-context speedup.
Apr 52 分钟
TurboQuantVector Quantization
TurboQuant and Vector Quantization: A Beginner's Breakdown
A Reddit user unpacks Google's TurboQuant blog from first principles, making LLM quantization accessible without heavy prerequisites.
Apr 51 分钟