What this is

Qwen3.6-27B achieved 95.7% accuracy on SimpleQA (a factual Q&A benchmark) running on a single RTX 3090—locally deployed AI deep search has, for the first time, approached the level of cloud products like Perplexity.This is the progress from the open-source project LDR (Local Deep Research). LDR adopts a LangGraph Agent (an AI program capable of autonomously calling tools and making step-by-step decisions to complete tasks) strategy: allowing the model to autonomously call search tools, break down sub-problems, and iterate across multiple rounds, up to a maximum of 50 rounds—essentially letting a small model compensate for its lack of parameters by "searching more and thinking further."The project maintainers raised a noteworthy observation: in deep search tasks, tool-calling capability is more important than the model's raw size. Qwen3.6's improvements in structured output and tool calling are exactly what Agent scenarios need most. This means when choosing a model, a "latest small model" might be better suited for Agent tasks than an "older large model."LDR also does a few things rare in the open-source community: academic source rating (integrating OpenAlex and DOAJ databases to judge source quality), user data encrypted with SQLCipher (unreadable even by admins), zero telemetry, and Docker images with SLSA signatures. MIT license, fully open-source.

Industry view

Supporters believe this is a genuine inflection point for local AI practicality. Over the past year, open-source deep search projects have consistently lagged behind cloud solutions in effectiveness. A single RTX 3090 matching Perplexity means viable solutions now exist for privacy-sensitive scenarios (legal, medical, financial research).However, the opposition is equally compelling. First, benchmark contamination risk: SimpleQA's questions may have leaked into Qwen3.6's training data, making inflated scores entirely possible. Second, language bias: xbench-DeepSearch is a Chinese benchmark, and Qwen, as one of the open-source models with the strongest Chinese capabilities, has a natural advantage. Third, it hasn't entered the harder exam rooms yet: BrowseComp and GAIA, the two benchmarks recognized by the community as hardcore tests for deep search capabilities, have not yet yielded results. Fourth, self-evaluation noise: LLMs grading themselves have a systematic bias; although spot-checking with Opus showed a tendency to underestimate, this cannot be ruled out.Our judgment: The 95.7% needs a discount, but even discounted to 85%, a single-GPU local deep search matching 80% of a commercial product's effectiveness is a usable starting point.

Impact on regular people

For enterprise IT: AI research assistants that keep data on-premises have moved from "proof of concept" to "ready for trial." LDR's encrypted storage and zero-telemetry design can pass internal security reviews better than the model itself; compliance-sensitive industries should take note.For the individual workplace: Knowledge workers have a new free option in their information retrieval toolkit. A second-hand RTX 3090 (around 5,000 RMB) can deliver deep search capabilities close to Perplexity Pro, offering practical value for roles requiring high-frequency information retrieval.For the consumer market: As open-source local solutions approach commercial products, Perplexity, Tavily, and others must find differentiated barriers—multimodality, product experience, and data ecosystems. Pure search accuracy is no longer a sufficient moat.