Back to home

Qwen

30 articles tagged with this topic

QwenRTX 3090

Consumer GPU Hits 100K Context: Local LLM Hardware Thresholds Drop Fast

We see an RTX 3090 run a 27B model, 100K context, 50 tokens/s via quant+MTP+KV compression. Consumer inference now rivals last year's enterprise setup

5d ago2 min read
QwenHermes Agent

Local Small Models Ace Junior IT Ops: 30-Year Vet Predicts Human-Machine Shift

Qwen3.6 27b + Agent did 3 hours of junior IT ops in 1.5 hours. Local small models have crossed the viability threshold for junior admin, shifting ente

5d ago2 min read
QwenSolidity

Weekend Solidity Fine-Tune Beats Opus: Vertical Small Models' ROI Moment

A developer fine-tuned Qwen into a 27B Solidity model, beating Claude Opus on coding benchmarks. The signal: cheap small vertical models are catching

6d ago2 min read
DeepSeekQwen

65% of Code Tasks Run Locally — API Bills Drop 74%, Most Pay a Cloud Laziness Tax

Devs found 65% of daily coding tasks run fine on local small models; task routing cuts API costs by 74%. Most overpay for cloud compute out of sheer l

6d ago2 min read
APEXQwen

APEX Quantizes 25 Models: 10B-Param AI on Home GPUs Flattens Compute Barrier

APEX quantizes 25+ MoE models with new I-Nano tier. 10B-param AI now runs on single consumer GPUs, slashing local deployment costs.

May 51 min read
llama.cppMTP

llama.cpp MTP Hits Beta: Local LLM Inference Speed Gap Narrowing

llama.cpp MTP beta supports Qwen3.5. With tensor parallelism maturing, the local-cloud inference speed gap is narrowing, making local LLM deployment m

May 42 min read
Hermes AgentQwen

Laid-Off Researcher, 21-Page Local AI Report: Agents Hit Usable-But-Slow Phase

A 15-year policy researcher used local open-source AI to autonomously generate a professional report in 5 hours. AI deep research hits the 'usable but

May 42 min read
NVIDIARTX A5000 Pro

NVIDIA RTX A5000 Pro 48GB Arrives: Local LLMs No Longer Need Dual GPUs

NVIDIA's $4,500 RTX A5000 Pro 48GB runs quantized Qwen 27B on a single card. Simpler than dual-GPU setups for local AI, but value requires careful mat

May 42 min read
QwenAssistant_Pepe

Qwen Fine-Tune Learns to Refuse — Anti-Sycophancy Is No Longer Just Talk

An open-source Qwen3-32B fine-tune deliberately fights AI sycophancy by injecting negativity bias. Not a stunt—a serious response to a long-ignored in

May 32 min read
QwenSAE

Qwen Open-Sources SAE: Decoding & Steering LLMs, China Enters Interpretability

Qwen open-sourced an 80K-feature SAE on HuggingFace. For the first time, a Chinese team makes LLM internals dissectible & steerable—a major interpreta

May 32 min read
Qwenlocal deployment

Qwen3.6 35B Beats 27B in Speed and Quality: Parameter Count Is Unreliable

Developers found Qwen3.6 35B outperforms 27B in quality and speed, breaking the "smaller is faster" myth. Benchmark data, not parameter counts, should

May 32 min read
hfviewerHugging Face

New Hugging Face Visualizer Cracks Open AI Black Boxes Without Code

hfviewer.com visualizes Hugging Face model architectures interactively. It replaces code with intuitive graphics, lowering the barrier to grasping AI

May 32 min read
QwenCoder-Next

Qwen3.6-27B Ties Coder-Next: Pick Models by Scenario, Not Benchmarks

20-hour test: Qwen3.6-27B ties MoE Coder-Next overall but differs by task. Disabling "thinking mode" surprisingly boosts stability. Scenario fit beats

May 33 min read
QwenLDR

Qwen3.6 Single-GPU Deep Search 95.7%: Local Matches Perplexity, Tool Use Beats Size

Open-source LDR hits 95.7% deep search on a single 3090, matching Perplexity cloud. Tool calling beats model size for agents; local AI search is now p

May 22 min read
QwenGemma

Qwen 3.6 Wins Benchmarks, Fails Reality: Benchmaxing Distorts AI Perception

Qwen 3.6 won benchmarks but lost to Gemma 4 in practice, burning 8000+ tokens in a loop. Benchmaxing distorts AI perception; firms must shift to real-

May 22 min read
QwenMCP

Open-Source Hybrid Recall Tool Gives Agents Memory Without Giant Contexts

Qwen3.5-4B MCP tool uses BM25+vector hybrid recall for Agent project memory. Focus shifts from "bigger context" to "better retrieval," cutting deploym

May 22 min read
QwenvLLM

Single 3090 Runs Qwen3 Natively on Windows: Local LLMs Drop Linux Requirement

Developers ran Qwen3.6-27B natively on Windows at 72 tok/s. This slashes deployment barriers—enterprises can run LLMs on existing GPUs without Linux.

May 22 min read
OllamaQwen

Ollama Runs Local LLMs on Mac with One Command — PCs Are the New AI Gateway

Ollama runs Qwen & DeepSeek locally on Mac via one command. MLX integration doubles inference speed. When deployment = app install, cloud-free AI may

May 21 min read
QwenAlibaba Cloud

Qwen 3.6 Replaces Copilot Locally: Zero API Cost, But Novices Beware

A dev used Qwen 3.6-27B quantized + RTX 6000 Pro to code all day with zero API calls. Local models hit the 'good enough' threshold, provided you can c

May 22 min read
QwenUnsloth

Qwen3.6-27B Quantized Fits Single Consumer GPU: Local Deployment Sweet Spot

Unsloth Q5-quantized Qwen3.6-27B runs stably on a single RTX 5090 across 19 rounds. Mid-size model local deployment is hitting the cost-capability swe

May 12 min read
QwenGemma

Gemma 4 Beats Qwen 3.6 With 1/5 The Tokens — Local AI Era Demands Efficiency

A Reddit test shows Gemma 4 beats Qwen 3.6 on a Pac-Man prompt using 1/5 the tokens and time. We argue: in local deployment, efficiency now trumps raw

May 12 min read
QwenAlibaba

阿里 Qwen 3.6 Max 悄悄上线,中国模型榜单第一——但开源还是闭源,这才是真正的问题

Alibaba's Qwen 3.6 Max quietly launched in preview, scoring highest among Chinese models — but its open-source status remains undecided.

Apr 202 min read
CrewAIQwen

CrewAI 装了跑不起来?一篇部署指南背后,是 AI 多智能体工具门槛还没降下来的现实

A 3,000-word Cre wAI setup guide went viral on Juejin—proof that multi -agent frameworks are hot, but nowhere near enterprise-ready.

Apr 203 min read
QwenClaude

有人开始用国产开源模型替换 Claude 做日常编程助手 — 性能差距正在缩小到「够用」

Developers on Reddit are seriously evaluating Alibaba's Qwen-35B-A3B as a local replacement for Claude Opus 4. 7 in daily coding workflows.

Apr 203 min read
QwenAlibaba

Qwen 3.6 35B Runs "Browser OS" Locally — Open- Source Models Are Closing the Gap

A developer ran Alibaba's Qwen 3.6 35B locally to achieve "Browser OS" — AI orchest rating a browser like an OS, no cloud needed.

Apr 191 min read
AlibabaQwen

一台消费级显卡,AI 帮用 户重写了整个记账软件——阿里 Qwen 新模型让「本地运行」开 始变得真实

Alibaba's Qwen3.6-35B-A3B rewrote a full accounting app on a single RTX 5070 Ti in under an hour—where older models failed.

Apr 183 min read
QwenAlib aba

Alibaba Releases Qwen3.6-35B-A3B Mixture-of-Experts Model

Alibaba's Qwen team releases Qwen3.6-35B-A3B, a 35B-parameter MoE model activating 3B parameters per token.

Apr 162 min read
QwenQwen3.6- 35B-A3B

Qwen3.6-35B-A3B released!

Alibaba's Qwen team releases a 35B sparse MoE model with only 3B active params under Apache 2.0.

Apr 163 min read
QwenAmazon SageMaker

Fine-Tune Qwen 2.5 for Tool Calling with SageMaker RLVR

AWS SageMaker serverless RLVR fine-tuning improved Qwen 2.5 7B tool-call accuracy by 57% without GPU management.

Apr 72 min read
llama.cppQwen

37 LLMs Benchmarked on MacBook Air M5 32GB: Full Speed Results

Community benchmark of 37 local LLMs on M5 Air 32GB using llama-bench reveals MoE models as clear winners for speed-to-quality ratio.

Apr 62 min read