Qwen3

13 articles tagged with this topic

手机本地跑 AI 不再需要联网—— 一个开源安卓应用正在把这件事变得可操作

Pocket LLM v 1.4.0 shrinks to ~200MB, lets users download models on demand and run AI fully offline on Android.

本地 AI 自己调工具还在「鬼打墙」——开源社区的真实使用体验比宣传落后整整一代

A 103-upvote Reddit thread exposes how local open-source models consistently hallucinate completed tasks during tool calling.

Apr 193 min read

Qwen3Gemma4

Qwen 3 还是 Gemma 4？本地部署玩家正在用实测替代官方跑分——小模型选型进入「场景优先」时代

A Reddit thread comparing Qwen 3 35B and Gemma 4 26B reveals a shift: users now trust personal testing over official benchmarks.

Apr 192 min read

Qwen3local LLM

本地运行 AI 编程时，要不要关掉「思考模式」？一个值得厘清的实用问题

Should you disable thinking mode when running Qwen3 locally for coding? A real debate with structural implications for AI dev toolch ains.

Apr 183 min read

Qwen3LocalLLaMA

Qwen 3.6 is the first local model that actually feels worth the effort for me

Alibaba's Qwen3.6 35B-A3B runs Q8 at 170 tokens/ sec with full 260K context on dual consumer GPUs.

Apr 174 min read

llama.cppQwen3

GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx

A llama.cpp fork with turbo3 KV cache quantization achieves ~40 tok/s on Qwen3-35 B-A3B with only 12GB VRAM.

Apr 163 min read

AWS-Trainium2vLL M

Speculative Decoding on AWS Trainium2 Cuts LLM Lat ency Up to 3x

AWS benchmarks show speculative decoding with vLLM on Trainium2 reduces inter -token latency up to 3x for decode-heavy workloads.

Apr 154 min read

GemmaQwen3

Why some small/medium models fail at grammar checking task?

Gem ma 4B, GPT-OSS-20B, and Qwen3-80B hallucinate spelling errors in grammatically correct sentences.

Apr 133 min read

EasyDistillQwen3

基于PAI的Agent数据构造与模型蒸馏解决方案

Alibaba Cloud PAI team open-sources EasyDistill, a ReAct- based data synthesis and model distillation toolkit validated on Qwen3 small models .

Apr 133 min read

Gemma 4Qwen3

Controlling Gemma 4 Thinking Tokens via System Prompts

Users struggle to reliably toggle Gemma 4's reasoning mode via system prompts, unlike Qwen-30B-A3B.

Apr 83 min read

llama.cppIntel Arc

llama.cpp Q8_0 Gets 3.1x Speedup on Intel Arc GPUs via SYCL Fix

A 200-line SYCL patch fixes missing reorder optimization for Q8_0, boosting Arc B70 from 4.88 to 15.24 t/s.

Apr 62 min read

Qwen3fine-tuning

Harmonic-9B: Two-Stage Qwen3-9B Fine-Tune for Agent Use Cases

Community researcher releases Harmonic-9B, a staged fine-tune of Qwen3-9B targeting reliable tool-calling and structured reasoning.

Apr 42 min read

Qwen3Alibaba Cloud

Qwen3.6-397B-A17B: First Open Model to Match Claude Sonnet in Real Use

Community testing finds Qwen3.6-397B-A17B matches Claude Sonnet reliability in real tasks, beating GLM-5.1 and Kimi-k2.5.