Unsloth
7 articles tagged with this topic
IBM Open-Sources Granite 4.1: 21 Quantized Versions Prove Bottleneck Isn't Size
IBM open-sources Granite 4.1. A 21-version quantization test shows no quality difference: small models' bottleneck is base capability, not compression
Mistral Local GGUF Bug Fixed — Open Source QA Gaps Are Bigger Than You Think
Mistral Medium 3.5 GGUF files corrupted, community-fixed. Reveals open source QA gap: APIs tested, local formats not—impacts enterprise deployments.
Mistral 3.5 Inference Bug Fixed by Open-Source Team — LLM Delivery QA Flashing Red
Unsloth fixed a Mistral Medium 3.5 inference bug from a core config error, exposing absent QA in commercial LLMs. Beware the "community beta" business
Qwen3.6-27B Quantized Fits Single Consumer GPU: Local Deployment Sweet Spot
Unsloth Q5-quantized Qwen3.6-27B runs stably on a single RTX 5090 across 19 rounds. Mid-size model local deployment is hitting the cost-capability swe
Qwen3.6 GGUF Benchmarks
Un sloth claims top KLD-vs-disk-space performance for Qwen3.6-35B-A3B quants in 21 of 22 pareto frontier comparisons.
GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx
A llama.cpp fork with turbo3 KV cache quantization achieves ~40 tok/s on Qwen3-35 B-A3B with only 12GB VRAM.
Unsloth Releases Full GGUF Quant Suite for MiniMax M2.7
Unsloth uploads 22 GGUF quantizations of MiniMax M2.7, ranging from 1-bit (60.7 GB) to BF16 (457 GB).