quantization

2 articles tagged with this topic

NVIDIA NVFP4 Puts 26B Model on Consumer GPU With Under 1% Accuracy Loss

NVIDIA's NVFP4 Gemma-4-26B shrinks to 18.8GB for consumer GPUs with <0.7% accuracy loss. 4-bit is now optimal, but also an ecosystem lock-in.

May 12 min read

Qwen3.5-9B GGUF Quant Rankings: Q8_0 Dominates KLD Scores

KLD benchmarks across community GGUF quants show Q8_0 variants cluster near 0.001 KLD, with quality degrading shar ply below Q5.

Apr 143 min read