Back to home

Compare

Comparing: A 22-Year-Old Rebuilt Anthropic's Secret Architecture by Guessing & 一个22岁开发者用「猜」的方式重建了Anthropic的神秘架构——这件事本身比那个架构更值得关注

AEN
OpenMythosAnthropicKye Gomez·

A 22-Year-Old Rebuilt Anthropic's Secret Architecture by Guessing

What This Is

Anthropic has a model architecture called Mythos. It has never disclosed any technical details about it. Kye Gomez — 22 years old, founder of the a gentic framework Swarms — assembled a set of public papers and industry speculation, pieced together what he believes approximates the Mythos architecture, and released it as an open-source project under the name OpenMythos.

The core idea behind the architecture is called the Recurrent-Depth Transformer (RDT): the same set of model parameters runs multiple computation passes, activating different "expert modules" on each pass, rather than stacking hundreds of layers with distinct parameters as mainstream approaches do. Concretely: the same weights execute up to 16 passes, each taking a different expert pathway. The entire reasoning process unfolds inside an internal vector space with no intermediate steps exposed — only a final answer is returned.

On parameter efficiency, a paper from UCSD and Together AI shows that a 770M-parameter RDT model matches a 1.3B-parameter standard Transformer — nearly half the parameter count, equivalent performance. More significantly, on generalization: when tested on " knowledge combinations not seen during training," the recurrent architecture still produces correct answers where standard models fail outright.

Industry View

Those who support this direction argue that the bottleneck for large models today is no longer "how many facts were memorized" but "whether known facts can be combined to answer novel questions." Running a recurrent architecture for additional passes at inference time appears to confer this combinatorial ability essentially for free, without expanding training scale. If that judgment holds, the competitive axis of the AI industry's next phase shifts from "train bigger models" to "make existing models reason deeper at inference time."

The counterarguments deserve equal s eriousness. First, OpenMythos is fundamentally a speculative reconstruction — Anthropic has never confirmed that Mythos uses this architecture, and Gomez himself acknowled ges it is "an integration of mainstream guesses." Second, the 770M-versus-1.3B experiment is small -scale; there is no evidence yet that the findings repl icate at larger parameter counts. Third, the stability of recurrent inference (keeping each computation pass from diverging) currently depends on specific injection mechanisms whose engineering reliability has not been validated at scale. Academic interest in a direction is one thing; production-grade deployment is another.

Impact on Regular People

For enterprise IT: If the "fewer parameters, more inference passes" approach is validated by the mainstream, the hardware barrier to enterprise AI deployment may fall. Achieving equivalent capability with smaller models means the cost calc ulus for on-premises private deployment gets recalculated from scratch.

For individual careers: This class of technical development will not change how individuals use AI tools in the short term. But it signals that AI's capability ceiling on "complex reasoning" and "knowledge synthesis" is being pushed further out — and the pace at which " judgment work" gets displaced by AI may arrive faster than most expect.

For the consumer market: Recurrent inference architectures are naturally suited to "think slowly, deliver an answer at the end" scenarios rather than real-time conversation. Consumer-facing products will feel no change in the short term, but in the medium term we may see a more clearly differentiated product landscape: a "deep reasoning mode" and a " fast response mode" as distinct, labeled tiers.

Source: juejin.cn
BZH
OpenMythosAnthropicKye Gomez·

一个22岁开发者用「猜」的方式重建了Anthropic的神秘架构——这件事本身比那个架构更值得关注

这是什么

Anthropic有一个叫Mythos的模型架构,从未对外公布技术细节。Kye Gomez——22岁,智能体框架Swarms的创始人——收集了现有公开论文和行业猜测,拼出了一套他认为接近Mythos的架构,并命名为OpenMythos开源发布。

这套架构的核心思路叫循环深度Transformer(Recurrent-Depth Transformer,RDT,即让同一组模型参数反复运算多轮,每轮激活不同的「专家模块」,而不是像主流做法那样堆砌几百层不同参数)。具体来说:同一组权重最多跑16遍,每遍走不同的专家路径,整个推理过程在内部向量空间完成,不输出任何中间步骤,直到最后才给出答案。

参数效率上,UCSD和Together AI的论文显示,770M参数的RDT模型追平了1.3B参数的标准Transformer——参数量少了将近一半,效果持平。更关键的是泛化能力:在「训练时没见过的知识组合」测试中,循环架构照样能答对,标准模型直接失败。

行业怎么看

支持这个方向的声音认为,当前大模型的瓶颈已经不是「记了多少事实」,而是「能不能把已知事实串联起来回答新问题」。循环架构在推理时多跑几遍,似乎免费获得了这种组合能力,而不需要再扩大训练规模。如果这个判断成立,AI行业下一阶段的竞争重心会从「训练更大的模型」转向「让现有模型在推理时想得更深」。

但反对意见同样值得认真对待。首先,OpenMythos本质上是一套基于推测的复现,Anthropic从未确认Mythos用了这套架构,Gom ez本人也承认这只是「主流猜测的整合」。其次,770M对标1 .3B的实验规模较小,能否在更大参数量上复现结论尚无证据。第三,循环推理的稳定性问题(每轮计算不发散)目前依赖特定注入机制,工程上的可靠性还没有大规模验证。学术界对这个方向感兴趣是一回事,能否进入产品级部署是另一回事。

对普通人的影响

对企业IT:如果「少参数、多推理轮次」的路线被主流验证,意味着企业部署AI的硬件门槛可能下降——同等效果用更小的模型实现,私有化部署的成本逻辑会重新算一遍。

对个人职场:这类技术进展短期内不会改变个人使用AI工具的方式,但它预示着AI在「复杂推理」和「知识组合」上的能力边界正在被推远,知识型工作中「判断力」被AI 替代的速度可能比预期快。

对消费市场:循环推理架构天然适合「慢慢想、最后给答案」的场景,而非实时对话。消费端产品短期内感知不到变化,但中期可能出现「深度推理模式」和「快速响应模式」更明确分层的产品形态。

Source: juejin.cn