Unsloth

找到 5 篇关于此标签的文章

Qwen3.6 GGUF Benchmarks

Un sloth claims top KLD-vs-disk-space performance for Qwen3.6-35B-A3B quants in 21 of 22 pareto frontier comparisons.

GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx

ll ama.cpp 的 turboquant 分支通过 turbo3 KV cache 量化，在单张 RTX 3080 12GB 显存上实现 Qwen3-35B-A3B 约 40 tok/s 推理速度，并支持 260k 上下文窗口。

UnslothMiniMax-M2.7

Unsloth 发布 MiniMax M2.7 完整 GGUF 量化套件

Unsloth 为 MiniMax M2.7 上传 22 个 GGUF 量化版本，覆盖从 1-bit（60.7 GB）到 BF16（457 GB）的完整量化梯度，大幅降低本地部署门槛。

llama.cppDistributed Training

Local Inference vs Distributed Training: Where the Real Gap Is

Indie devs run models locally, but training still requires datacenter scale. Can distributed training ever close that gap?

Gemma 4 27B vs Qwen 3.5 27B: SVG Generation Benchmark

Reddit users compare Gemma 4 31B and Qwen 3.5 27B Q4 quants on SVG creation, coding, and function calling tasks.