GGML

1 article tagged with this topic

GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx

A llama.cpp fork with turbo3 KV cache quantization achieves ~40 tok/s on Qwen3-35 B-A3B with only 12GB VRAM.