Model Quantization

4 articles tagged with this topic

Updating 1% Params: Fine-Tuning & Quantization Slash Custom LLM Deployment Barriers

Fine-tuning turns LLMs into specialists; quantization trims them down. LoRA updates just 1% of params, enabling SMEs to customize AI with consumer GPU

May 52 min read

APEXQwen

APEX Quantizes 25 Models: 10B-Param AI on Home GPUs Flattens Compute Barrier

APEX quantizes 25+ MoE models with new I-Nano tier. 10B-param AI now runs on single consumer GPUs, slashing local deployment costs.

May 51 min read

QATModel Quantization

AI Quantization Ditches Full Downgrades for Mixed-Precision Topology

16-to-8-bit AI shifts crash precision. A new "equivalent topology" uses an 8-bit base, upgrading sensitive layers to 16-bit, balancing speed and preci

May 12 min read

QwenUnsloth

Qwen3.6-27B Quantized Fits Single Consumer GPU: Local Deployment Sweet Spot

Unsloth Q5-quantized Qwen3.6-27B runs stably on a single RTX 5090 across 19 rounds. Mid-size model local deployment is hitting the cost-capability swe

May 12 min read