Qwen3.5
9 articles tagged with this topic
Gemma 4 and Qwen 3.5 GGUFs: Detailed Analysis by oobabooga
Oobabooga published 5 benchmark reports covering 70-90 GGUF quants each for Gemma 4 and Qwen 3.5 models using KL Divergence methodology.
Qwen3.5-9B GGUF Quant Rankings: Q8_0 Dominates KLD Scores
KLD benchmarks across community GGUF quants show Q8_0 variants cluster near 0.001 KLD, with quality degrading shar ply below Q5.
DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max)
Open-source DFlash achiev es 4.13x speedup on Qwen3.5-9B using MLX on M5 Max with 89.4% token acceptance rate.
Hitoku, open-source local macOS context aware assistant with Qwen3.5/Gemma4
Open-source macOS assistant runs Gemma 4 and Qwen 3.5 fully on-device with screen and document context .
Qwen 3.5 35B Benchmarks: Vulkan vs ROCm on AMD Strix Halo
Vulkan wins token generation (~57.5 t/s) while ROCm leads prompt processing (~1052 t/s) on AMD Ryzen AI MAX+ 395.
Gemma-4 E4B Vision Benchmarked: Scores 0.27 vs Qwen3.5-4B's 0.5
Community testing shows Gemma-4 E4B scores 0.27 on 100 vision tasks vs Qwen3.5-4B's baseline 0.5, raising red flags for multimodal use.
Qwen3.5 vs Gemma4 vs Cloud LLMs: Python Turtle Drawing Benchmark
A Reddit user benchmarks local and cloud LLMs on Python turtle graphics, revealing Gemma4 and Gemini share visual style.
Gemma 4 27B vs Qwen 3.5 27B: SVG Generation Benchmark
Reddit users compare Gemma 4 31B and Qwen 3.5 27B Q4 quants on SVG creation, coding, and function calling tasks.
Run Claude Code Fully Offline Using Qwen3.5 27B and llama.cpp
A developer runs Claude Code CLI against a local llama.cpp server using Qwen3.5 27B, achieving 9+ t/s on Strix Halo hardware.