Back to home

GGUF

11 articles tagged with this topic

UnslothQwen3.6

Qwen3.6 GGUF Benchmarks

Un sloth claims top KLD-vs-disk-space performance for Qwen3.6-35B-A3B quants in 21 of 22 pareto frontier comparisons.

Apr 173 min read
Gemma- 4Qwen3.5

Gemma 4 and Qwen 3.5 GGUFs: Detailed Analysis by oobabooga

Oobabooga published 5 benchmark reports covering 70-90 GGUF quants each for Gemma 4 and Qwen 3.5 models using KL Divergence methodology.

Apr 153 min read
Qwen3.5GGUF

Qwen3.5-9B GGUF Quant Rankings: Q8_0 Dominates KLD Scores

KLD benchmarks across community GGUF quants show Q8_0 variants cluster near 0.001 KLD, with quality degrading shar ply below Q5.

Apr 143 min read
llama.cppAndroid

端侧AI 模型部署实战五(Android大模型加载)

Step-by-step JNI bridge implementation for running quantized LLMs on Android using llama.cpp.

Apr 143 min read
UnslothMiniMax-M2.7

Unsloth Releases Full GGUF Quant Suite for MiniMax M2.7

Unsloth uploads 22 GGUF quantizations of MiniMax M2.7, ranging from 1-bit (60.7 GB) to BF16 (457 GB).

Apr 123 min read
MiniMax-M2.7llama.cpp

MiniMax-M1 229B MoE Gets First GGUF Quants for Apple Silicon

MiniMax-M2.7 (229B MoE) quantized to Q3_K_L (110GB) and Q8_0 (243GB) GGUF formats, now on HuggingFace.

Apr 123 min read
Gemma 4llama.cpp

Gemma 4 Local CUDA Setup: Precision Traps and Real Benchmarks

Running Gemma 4 locally on CUDA requires strict dtype matching at KV cache boundaries or output degenerates silently.

Apr 72 min read
llama.cppGemma 4

Gemma 4 26B: Q8 mmproj Unlocks 60K+ Context With Vision

Switching from F16 to Q8_0 mmproj on Gemma 4 26B adds ~30K context tokens with no vision quality loss.

Apr 62 min read
HunyuanOCRGGUF

HunyuanOCR 1B Runs at 90 t/s on GTX 1060 via GGUF

Tencent's HunyuanOCR 1B model runs at 90 tokens/sec on a GTX 1060 via GGUF, enabling local OCR on budget hardware.

Apr 61 min read
Qwen3-Coderllama.cpp

Run Qwen3-Coder 80B Locally at 54GB With Apex Quantization

A community GGUF quantization shrinks Qwen3-Coder 80B to 54.1GB, making fast local coding inference practical.

Apr 52 min read
Qwen3fine-tuning

Harmonic-9B: Two-Stage Qwen3-9B Fine-Tune for Agent Use Cases

Community researcher releases Harmonic-9B, a staged fine-tune of Qwen3-9B targeting reliable tool-calling and structured reasoning.

Apr 42 min read