Back to home
Quantization
3 articles tagged with this topic
llama.cppllama-bench
llama.cpp llama-bench Adds -fitc and -fitt Benchmark Flags
llama-bench gains -fitc and -fitt flags from build b4679, enabling finer control over benchmark timing output.
Apr 61 min read
llama.cppQwen Coder
APEX Quantization vs K-Quants: Why MoE Coding Models Need Different Compression
APEX quantization targets MoE architecture coherence layers at Q8, outperforming generic K-quants for multi-file coding agents.
Apr 62 min read
REAPQuantization
35% REAP Quantization Runs 397B Model on 96GB GPU
A community researcher achieved usable quality from a 397B parameter model using 35% REAP quantization on a 96GB GPU.
Apr 51 min read