Qwen3.5-9B GGUF Quant Rankings: Q8_0 Dominates KLD Scores

What Happened

A community benchmarker on r/LocalLLaMA published updated KL Divergence (KLD) evaluations across more than 35 community GGUF quantizations of Qwen3.5-9B, posted with 56 upvotes as of writing. The analysis measures each quant's probability distribution drift against the BF16 baseline, providing a dataset-independent fidelity metric for local deployment decisions.

The methodology explicitly favors KLD over perplexity ( PPL): per the author, "PPL is noisy as it can get a better score by pure luck. KLD is better as it is not relying on the dataset but on the baseline."

Why It Matters

For engineering teams running Qwen3.5-9B on-premise or at the edge, quantization selection directly impacts inference fidelity, VRAM budget, and storage cost. This comparison gives practitioners a ranked, reproduc ible basis for that decision rather than defaulting to whatever quant a given hub happens to surface first.

The data shows a clear f idelity cliff: Q8_0 and Q6_K variants maintain KLD scores below 0.005, while Q4- range quants jump to 0.015–0.026 — a 3–20x increase in distribution drift. Teams optimizing for accuracy-per-gig abyte will find the Q6_K tier offers the best tradeoff before quality degrades substantially.

The Technical Detail

Top-Tier: Q8_0 Cluster (KLD < 0.002)

All Q8_0 variants score within a tight band. The top performers:

eaddario/Qwen3.5-9B-Q8_0 — 8.873 G iB, KLD: 0.001198 (lowest recorded)
unsloth/Qwen3. 5-9B-UD-Q8_K_XL — 12.083 GiB, KLD: 0.001243
bartowski/Qwen_Qwen3.5-9B-Q8_0 — 8.89 GiB, KLD: 0.001405
lmstudio-community/Qwen3.5-9B-Q8_0 — 8.873 GiB, KLD : 0.001410

Note: unsloth/Qwen3.5-9B-UD-Q8 _K_XL achieves comparable fidelity at 12.083 GiB — 36% larger than standard Q8_0 for marginal KLD gain. For VRAM-constrained deployments, standard Q8_0 is the clear default.

Mid-Tier: Q6_K Range (KLD 0.002–0.005)

Q6_K variants deliver meaningful size savings with contained fidelity loss:

unsloth/Qwen3.5-9B-UD- Q6_K_XL — 8.156 GiB, KLD: 0.001910
bartowski/Qwen_Qwen3.5-9B-Q6_K_L — 7 .592 GiB, KLD: 0.002371
bartowski/Qwen_Qwen3.5-9B-Q6_K — 7.134 GiB, KLD: 0.002813

The bartowski/Q6_K variant cuts 1.76 GiB versus standard Q8_0 with a KLD increase of approximately 0.0016 — acceptable for most production use cases where exact output distribution matching is not required.

Q5 Range: Approaching the Cliff (KLD 0. 006–0.010)

Q5 variants show progressive degradation:

bartowski/Qwen_Qwen3.5-9B- Q5_K_L — 6.976 GiB, KLD: 0.006068
bartowski/Qwen_Qwen3.5-9B-Q5_K_M — 6. 392 GiB, KLD: 0.006604
bartowski/Qwen_Qwen3.5-9B-Q5_K_S — 6.078 GiB, KLD: 0.008 110

KLD roughly doubles between Q6_K and Q5_K_M for most bartowski variants, indicating non-trivial information loss at this compression level.

Q4 and Below: Sharp Degradation (KLD 0.015–0.026)
The Q4 tier shows KLD scores 3–10 x higher than Q8_0 baselines:
`bartowski/Qwen_Qwen3.5-9B-Q4_K_L` — 6.188 GiB, KLD : 0.015064
`bartowski/Qwen_Qwen3.5-9B-Q4_K_M` — 5.485 GiB, KLD: 0.016754
`bartowski/Qwen_Qwen3.5-9B-IQ4_XS` — 4.846 GiB, KLD: 0.025705
Notably, `eaddario/Qwen3.5-9B-Q6_K` scores an anomalous KLD of 0.021 010 despite being a Q6 quantization — worse than several Q4 variants — suggesting a packaging or quantization artifact specific to that build. Engineers should treat this outlier with caution and verify independently.

PPL Scores

Perplexity scores across all variants cluster tightly between approximately 19.17 and 19.71, reinforcing the author's claim that PPL is insufficiently discriminating for quant selection at this compression range. KLD spreads are a more reliable signal.

What To Watch

Q wen3.5 broader release activity: Community quant prolif eration typically accelerates in the 30 days following a model drop. Expect additional iMatrix-based IQ quants that may challenge current Q5/Q4 KLD standings.
Unsloth UD-series expansion: The UD-Q8_K_XL and UD-Q6_K_ XL variants use non-standard quantization schemas that trade file size for fidelity. Watch for un sloth publishing updated UD quants lower in the bit range.
llama.cpp quantization improvements: Any upstream changes to GGUF quantization kernels in llama.cpp would invalidate current rankings — monitor the llama.cpp repo for quant-related PRs over the next 30 days.
Replication: The methodology and tooling for this KLD eval were not published in the source post. Independent replication would strengthen confidence in the rankings, particularly the eaddario Q6_K anomaly.

Qwen3.5-9B GGUF Quant Rankings: Q8_0 Dominates KLD Scores

What Happened

Why It Matters

The Technical Detail

Top-Tier: Q8_0 Cluster (KLD < 0.002)

Mid-Tier: Q6_K Range (KLD 0.002–0.005)

Q5 Range: Approaching the Cliff (KLD 0. 006–0.010)

PPL Scores

What To Watch

Related Reading

AI Keeps Forg etting Half Your Docs? DeepSeek Now Reads a Full Book at Once

Quarter the Cost , Same AI Quality : How I Cut Client Bills

AI Tools Move Fast : Workflow Died in 3 Months . A Selection R hythm Saved Me

Qwen3 - 27B on One RTX 3090: 85 TPS, 125K Context , Vision — Overnight

Claude Has a Design Mode Now — My First Thought: "Finally, No More Explaining Myself"

The AI Writing Tool Even Gov't Agencies Use Quietly — We Can Too