Qwen3
13 articles tagged with this topic
手机本地跑 AI 不再需要联网—— 一个开源安卓应用正在把这件事变得可操作
Pocket LLM v 1.4.0 shrinks to ~200MB, lets users download models on demand and run AI fully offline on Android.
本地 AI 自己调工 具还在「鬼打墙」——开源社区的真实使 用体验比宣传落后整整一代
A 103-upvote Reddit thread exposes how local open-source models consistently hallucinate completed tasks during tool calling.
Qwen 3 还是 Gemma 4?本地 部署玩家正在用实测替 代官方跑分——小模型选型 进入「场景优先」时代
A Reddit thread comparing Qwen 3 35B and Gemma 4 26B reveals a shift: users now trust personal testing over official benchmarks.
本地运行 AI 编程时, 要不要关掉「思考模式」?一个值得厘 清的实用问题
Should you disable thinking mode when running Qwen3 locally for coding? A real debate with structural implications for AI dev toolch ains.
Qwen 3.6 is the first local model that actually feels worth the effort for me
Alibaba's Qwen3.6 35B-A3B runs Q8 at 170 tokens/ sec with full 260K context on dual consumer GPUs.
GPoUr with ~12gb vram and a 3080 getting 40tg/s on qwen3.6 35BA3B w/ 260k ctx
A llama.cpp fork with turbo3 KV cache quantization achieves ~40 tok/s on Qwen3-35 B-A3B with only 12GB VRAM.
Speculative Decoding on AWS Trainium2 Cuts LLM Lat ency Up to 3x
AWS benchmarks show speculative decoding with vLLM on Trainium2 reduces inter -token latency up to 3x for decode-heavy workloads.
Why some small/medium models fail at grammar checking task?
Gem ma 4B, GPT-OSS-20B, and Qwen3-80B hallucinate spelling errors in grammatically correct sentences.
基于PAI的Agent数据构造与模型蒸馏解决方案
Alibaba Cloud PAI team open-sources EasyDistill, a ReAct- based data synthesis and model distillation toolkit validated on Qwen3 small models .
Controlling Gemma 4 Thinking Tokens via System Prompts
Users struggle to reliably toggle Gemma 4's reasoning mode via system prompts, unlike Qwen-30B-A3B.
llama.cpp Q8_0 Gets 3.1x Speedup on Intel Arc GPUs via SYCL Fix
A 200-line SYCL patch fixes missing reorder optimization for Q8_0, boosting Arc B70 from 4.88 to 15.24 t/s.
Harmonic-9B: Two-Stage Qwen3-9B Fine-Tune for Agent Use Cases
Community researcher releases Harmonic-9B, a staged fine-tune of Qwen3-9B targeting reliable tool-calling and structured reasoning.
Qwen3.6-397B-A17B: First Open Model to Match Claude Sonnet in Real Use
Community testing finds Qwen3.6-397B-A17B matches Claude Sonnet reliability in real tasks, beating GLM-5.1 and Kimi-k2.5.