Article Not Found

What this is

Qwen3.6-27B achieved 95.7% accuracy on SimpleQA (a factual Q&A benchmark) running on a single RTX 3090—locally deployed AI deep search has, for the first time, approached the level of cloud products like Perplexity.This is the progress from the open-source project LDR (Local Deep Research). LDR adopts a LangGraph Agent (an AI program capable of autonomously calling tools and making step-by-step decisions to complete tasks) strategy: allowing the model to autonomously call search tools, break down sub-problems, and iterate across multiple rounds, up to a maximum of 50 rounds—essentially letting a small model compensate for its lack of parameters by "searching more and thinking further."The project maintainers raised a noteworthy observation: in deep search tasks, tool-calling capability is more important than the model's raw size. Qwen3.6's improvements in structured output and tool calling are exactly what Agent scenarios need most. This means when choosing a model, a "latest small model" might be better suited for Agent tasks than an "older large model."LDR also does a few things rare in the open-source community: academic source rating (integrating OpenAlex and DOAJ databases to judge source quality), user data encrypted with SQLCipher (unreadable even by admins), zero telemetry, and Docker images with SLSA signatures. MIT license, fully open-source.

Industry view

Supporters believe this is a genuine inflection point for local AI practicality. Over the past year, open-source deep search projects have consistently lagged behind cloud solutions in effectiveness. A single RTX 3090 matching Perplexity means viable solutions now exist for privacy-sensitive scenarios (legal, medical, financial research).However, the opposition is equally compelling. First, benchmark contamination risk: SimpleQA's questions may have leaked into Qwen3.6's training data, making inflated scores entirely possible. Second, language bias: xbench-DeepSearch is a Chinese benchmark, and Qwen, as one of the open-source models with the strongest Chinese capabilities, has a natural advantage. Third, it hasn't entered the harder exam rooms yet: BrowseComp and GAIA, the two benchmarks recognized by the community as hardcore tests for deep search capabilities, have not yet yielded results. Fourth, self-evaluation noise: LLMs grading themselves have a systematic bias; although spot-checking with Opus showed a tendency to underestimate, this cannot be ruled out.Our judgment: The 95.7% needs a discount, but even discounted to 85%, a single-GPU local deep search matching 80% of a commercial product's effectiveness is a usable starting point.

Impact on regular people

For enterprise IT: AI research assistants that keep data on-premises have moved from "proof of concept" to "ready for trial." LDR's encrypted storage and zero-telemetry design can pass internal security reviews better than the model itself; compliance-sensitive industries should take note.For the individual workplace: Knowledge workers have a new free option in their information retrieval toolkit. A second-hand RTX 3090 (around 5,000 RMB) can deliver deep search capabilities close to Perplexity Pro, offering practical value for roles requiring high-frequency information retrieval.For the consumer market: As open-source local solutions approach commercial products, Perplexity, Tavily, and others must find differentiated barriers—multimodality, product experience, and data ecosystems. Pure search accuracy is no longer a sufficient moat.

Qwen3.6 Single-GPU Deep Search 95.7%: Local Matches Perplexity, Tool Use Beats Size

What this is

Industry view

Impact on regular people

相关推荐

Qwen3.6单卡深搜95.7%—本地AI追平Perplexity，Agent比拼工具调用而非模型大小

Qwen 3.6 跑分赢实测输 — 刷榜正在扭曲大模型能力认知

开源社区造出混合检索记忆工具 — Agent 不靠大上下文也能记事了

单张 3090 在 Windows 跑通 Qwen3 — 本地部署大模型不再必须折腾 Linux

Qwen 3.6 本地替代 Copilot — 零 API 费，但新手别碰

开发者集体寻找完全离线的AI编程工具 — 代码隐私焦虑正从大厂蔓延到个人