Article Not Found

Xiaomi MiMo Tops Reasoning Test: Cost-Efficiency Beats Parameter Count

Xiaomi's MiMo-V2.5-Pro achieved a cost of $0.99 per game in a complex social reasoning test, only one-third of Kimi K2.6's cost — LLM competition is shifting from "who is smarter" to "who is cheaper and good enough."

What this is

Recently, developers tested major LLMs in "Blood on the Clocktower," a highly complex social deduction game similar to "Werewolf." Xiaomi's MiMo-V2.5-Pro performed outstandingly, ranking in the top tier alongside Kimi K2.6, but with entirely different logic. While Kimi reasons meticulously, it consumes an average of 580,000 tokens (the smallest unit of text a model processes) per game, taking 10-15 hours and costing $2.65. MiMo, however, consumes only about 180,000 tokens per game, finishing in 2-3 hours, dropping the cost to $0.99, with a tool-calling (the model's operation of requesting external functions to execute tasks) error rate of only 0.4%. We note that MiMo is emerging as the most practical choice among high-end models by being "smart enough and highly cost-efficient."

Industry view

We believe this test exposes a real pain point in LLM deployment: reasoning redundancy. Kimi's approach of piling compute for ultimate accuracy is not cost-effective in commercial scenarios; verbose outputs and long response times will directly deter enterprise users. MiMo, while ensuring core reasoning capabilities, compresses cost and response time into a usable range, and its 0.4% error rate indicates excellent stability. However, it's worth noting that MiMo's win rate is severely imbalanced—an 88% win rate when playing the good faction, but only 48% for the evil faction. This exposes its shortcomings in strategic flexibility: when camouflage or non-logical maneuvering is required, it appears too "honest" and clumsy. At the same time, game testing metrics cannot be fully equated with reliability in enterprise production environments.

Impact on regular people

For enterprise IT: As the reasoning cost for a single complex task drops below $1, the scaled deployment of Agents (AIs capable of independently executing multi-step tasks) in long-process businesses becomes financially viable.

For the workplace: LLMs can now handle complex multi-person social reasoning and maneuvering. Negotiation and coordination roles that rely on information asymmetry and complex communication will face new pressures for automated efficiency.

For the consumer market: Hardware manufacturers like Xiaomi are leveraging low-cost models to penetrate the edge ecosystem. In the future, consumers may directly access cheap and ultra-fast reasoning services on their phones and vehicles.

Xiaomi MiMo Tops Reasoning Test: Cost-Efficiency Beats Parameter Count

What this is

Industry view

Impact on regular people

相关推荐

小米大模型 MiMo 在复杂推理测试中登顶 — 性价比开始取代参数量成为新焦点

客户付了钱却打不开你的产品 — 云服务挂了你有后路吗

Gemma 4 模型文件现身 HuggingFace — 开源社区跑在了官方工具链前面

OpenAI 隐私过滤器实测胜出 — 但严格匹配翻车，分词器偏移是元凶

你的监控画面可能被供应商当演示 — 3 步检查谁在访问你的数据

Cloudflare 让多租户自己写工作流，AI 平台的长任务执行不再卡壳