GGUF
11 articles tagged with this topic
Qwen3.6 GGUF Benchmarks
Un sloth claims top KLD-vs-disk-space performance for Qwen3.6-35B-A3B quants in 21 of 22 pareto frontier comparisons.
Gemma 4 and Qwen 3.5 GGUFs: Detailed Analysis by oobabooga
Oobabooga published 5 benchmark reports covering 70-90 GGUF quants each for Gemma 4 and Qwen 3.5 models using KL Divergence methodology.
Qwen3.5-9B GGUF Quant Rankings: Q8_0 Dominates KLD Scores
KLD benchmarks across community GGUF quants show Q8_0 variants cluster near 0.001 KLD, with quality degrading shar ply below Q5.
端侧AI 模型部署实战五(Android大模型加载)
Step-by-step JNI bridge implementation for running quantized LLMs on Android using llama.cpp.
Unsloth Releases Full GGUF Quant Suite for MiniMax M2.7
Unsloth uploads 22 GGUF quantizations of MiniMax M2.7, ranging from 1-bit (60.7 GB) to BF16 (457 GB).
MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon
MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.
Gemma 4 Local CUDA Setup: Precision Traps and Real Benchmarks
Running Gemma 4 locally on CUDA requires strict dtype matching at KV cache boundaries or output degenerates silently.
Gemma 4 26B: Q8 mmproj Unlocks 60K+ Context With Vision
Switching from F16 to Q8_0 mmproj on Gemma 4 26B adds ~30K context tokens with no vision quality loss.
HunyuanOCR 1B Runs at 90 t/s on GTX 1060 via GGUF
Tencent's HunyuanOCR 1B model runs at 90 tokens/sec on a GTX 1060 via GGUF, enabling local OCR on budget hardware.
Run Qwen3-Coder 80B Locally at 54GB With Apex Quantization
A community GGUF quantization shrinks Qwen3-Coder 80B to 54.1GB, making fast local coding inference practical.
Harmonic-9B: Two-Stage Qwen3-9B Fine-Tune for Agent Use Cases
Community researcher releases Harmonic-9B, a staged fine-tune of Qwen3-9B targeting reliable tool-calling and structured reasoning.