GGUF
9 articles tagged with this topic
Google Gemma 4 Fixes Chat Template — Local LLM Usability Inches Forward
Google fixed Gemma 4's chat template bug; community quantized versions updated. Not major news, but proves local AI usability inches up via detail ref
Mistral Local GGUF Bug Fixed — Open Source QA Gaps Are Bigger Than You Think
Mistral Medium 3.5 GGUF files corrupted, community-fixed. Reveals open source QA gap: APIs tested, local formats not—impacts enterprise deployments.
Qwen3.6 GGUF Benchmarks
Un sloth claims top KLD-vs-disk-space performance for Qwen3.6-35B-A3B quants in 21 of 22 pareto frontier comparisons.
Gemma 4 and Qwen 3.5 GGUFs: Detailed Analysis by oobabooga
Oobabooga published 5 benchmark reports covering 70-90 GGUF quants each for Gemma 4 and Qwen 3.5 models using KL Divergence methodology.
Qwen3.5-9B GGUF Quant Rankings: Q8_0 Dominates KLD Scores
KLD benchmarks across community GGUF quants show Q8_0 variants cluster near 0.001 KLD, with quality degrading shar ply below Q5.
端侧AI 模型部署实战五(Android大模型加载)
Step-by-step JNI bridge implementation for running quantized LLMs on Android using llama.cpp.
Unsloth Releases Full GGUF Quant Suite for MiniMax M2.7
Unsloth uploads 22 GGUF quantizations of MiniMax M2.7, ranging from 1-bit (60.7 GB) to BF16 (457 GB).
MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon
MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.
Gemma 4 Local CUDA Setup: Precision Traps and Real Benchmarks
Running Gemma 4 locally on CUDA requires strict dtype matching at KV cache boundaries or output degenerates silently.