Qwen
30 articles tagged with this topic
Consumer GPU Hits 100K Context: Local LLM Hardware Thresholds Drop Fast
We see an RTX 3090 run a 27B model, 100K context, 50 tokens/s via quant+MTP+KV compression. Consumer inference now rivals last year's enterprise setup
Local Small Models Ace Junior IT Ops: 30-Year Vet Predicts Human-Machine Shift
Qwen3.6 27b + Agent did 3 hours of junior IT ops in 1.5 hours. Local small models have crossed the viability threshold for junior admin, shifting ente
Weekend Solidity Fine-Tune Beats Opus: Vertical Small Models' ROI Moment
A developer fine-tuned Qwen into a 27B Solidity model, beating Claude Opus on coding benchmarks. The signal: cheap small vertical models are catching
65% of Code Tasks Run Locally — API Bills Drop 74%, Most Pay a Cloud Laziness Tax
Devs found 65% of daily coding tasks run fine on local small models; task routing cuts API costs by 74%. Most overpay for cloud compute out of sheer l
APEX Quantizes 25 Models: 10B-Param AI on Home GPUs Flattens Compute Barrier
APEX quantizes 25+ MoE models with new I-Nano tier. 10B-param AI now runs on single consumer GPUs, slashing local deployment costs.
llama.cpp MTP Hits Beta: Local LLM Inference Speed Gap Narrowing
llama.cpp MTP beta supports Qwen3.5. With tensor parallelism maturing, the local-cloud inference speed gap is narrowing, making local LLM deployment m
Laid-Off Researcher, 21-Page Local AI Report: Agents Hit Usable-But-Slow Phase
A 15-year policy researcher used local open-source AI to autonomously generate a professional report in 5 hours. AI deep research hits the 'usable but
NVIDIA RTX A5000 Pro 48GB Arrives: Local LLMs No Longer Need Dual GPUs
NVIDIA's $4,500 RTX A5000 Pro 48GB runs quantized Qwen 27B on a single card. Simpler than dual-GPU setups for local AI, but value requires careful mat
Qwen Fine-Tune Learns to Refuse — Anti-Sycophancy Is No Longer Just Talk
An open-source Qwen3-32B fine-tune deliberately fights AI sycophancy by injecting negativity bias. Not a stunt—a serious response to a long-ignored in
Qwen Open-Sources SAE: Decoding & Steering LLMs, China Enters Interpretability
Qwen open-sourced an 80K-feature SAE on HuggingFace. For the first time, a Chinese team makes LLM internals dissectible & steerable—a major interpreta
Qwen3.6 35B Beats 27B in Speed and Quality: Parameter Count Is Unreliable
Developers found Qwen3.6 35B outperforms 27B in quality and speed, breaking the "smaller is faster" myth. Benchmark data, not parameter counts, should
New Hugging Face Visualizer Cracks Open AI Black Boxes Without Code
hfviewer.com visualizes Hugging Face model architectures interactively. It replaces code with intuitive graphics, lowering the barrier to grasping AI
Qwen3.6-27B Ties Coder-Next: Pick Models by Scenario, Not Benchmarks
20-hour test: Qwen3.6-27B ties MoE Coder-Next overall but differs by task. Disabling "thinking mode" surprisingly boosts stability. Scenario fit beats
Qwen3.6 Single-GPU Deep Search 95.7%: Local Matches Perplexity, Tool Use Beats Size
Open-source LDR hits 95.7% deep search on a single 3090, matching Perplexity cloud. Tool calling beats model size for agents; local AI search is now p
Qwen 3.6 Wins Benchmarks, Fails Reality: Benchmaxing Distorts AI Perception
Qwen 3.6 won benchmarks but lost to Gemma 4 in practice, burning 8000+ tokens in a loop. Benchmaxing distorts AI perception; firms must shift to real-
Open-Source Hybrid Recall Tool Gives Agents Memory Without Giant Contexts
Qwen3.5-4B MCP tool uses BM25+vector hybrid recall for Agent project memory. Focus shifts from "bigger context" to "better retrieval," cutting deploym
Single 3090 Runs Qwen3 Natively on Windows: Local LLMs Drop Linux Requirement
Developers ran Qwen3.6-27B natively on Windows at 72 tok/s. This slashes deployment barriers—enterprises can run LLMs on existing GPUs without Linux.
Ollama Runs Local LLMs on Mac with One Command — PCs Are the New AI Gateway
Ollama runs Qwen & DeepSeek locally on Mac via one command. MLX integration doubles inference speed. When deployment = app install, cloud-free AI may
Qwen 3.6 Replaces Copilot Locally: Zero API Cost, But Novices Beware
A dev used Qwen 3.6-27B quantized + RTX 6000 Pro to code all day with zero API calls. Local models hit the 'good enough' threshold, provided you can c
Qwen3.6-27B Quantized Fits Single Consumer GPU: Local Deployment Sweet Spot
Unsloth Q5-quantized Qwen3.6-27B runs stably on a single RTX 5090 across 19 rounds. Mid-size model local deployment is hitting the cost-capability swe
Gemma 4 Beats Qwen 3.6 With 1/5 The Tokens — Local AI Era Demands Efficiency
A Reddit test shows Gemma 4 beats Qwen 3.6 on a Pac-Man prompt using 1/5 the tokens and time. We argue: in local deployment, efficiency now trumps raw
阿里 Qwen 3.6 Max 悄悄上线,中国模型榜单第一——但开源还是闭源,这才是真正的问题
Alibaba's Qwen 3.6 Max quietly launched in preview, scoring highest among Chinese models — but its open-source status remains undecided.
CrewAI 装了跑不起来?一篇部署指南背后,是 AI 多智能体工具门槛还没降下来的现实
A 3,000-word Cre wAI setup guide went viral on Juejin—proof that multi -agent frameworks are hot, but nowhere near enterprise-ready.
有人开始用国产开源模型替换 Claude 做日常编程助手 — 性能差距正在缩小到「够用」
Developers on Reddit are seriously evaluating Alibaba's Qwen-35B-A3B as a local replacement for Claude Opus 4. 7 in daily coding workflows.
Qwen 3.6 35B Runs "Browser OS" Locally — Open- Source Models Are Closing the Gap
A developer ran Alibaba's Qwen 3.6 35B locally to achieve "Browser OS" — AI orchest rating a browser like an OS, no cloud needed.
一台消费级显卡,AI 帮用 户重写了整个记账软件——阿里 Qwen 新模型让「本地运行」开 始变得真实
Alibaba's Qwen3.6-35B-A3B rewrote a full accounting app on a single RTX 5070 Ti in under an hour—where older models failed.
Alibaba Releases Qwen3.6-35B-A3B Mixture-of-Experts Model
Alibaba's Qwen team releases Qwen3.6-35B-A3B, a 35B-parameter MoE model activating 3B parameters per token.
Qwen3.6-35B-A3B released!
Alibaba's Qwen team releases a 35B sparse MoE model with only 3B active params under Apache 2.0.
Fine-Tune Qwen 2.5 for Tool Calling with SageMaker RLVR
AWS SageMaker serverless RLVR fine-tuning improved Qwen 2.5 7B tool-call accuracy by 57% without GPU management.
37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results
Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.