DeepSeek's cache hit price drops to as low as two cents per million tokens, and this figure is reshaping developer habits: using expensive models for decision-making and cheap models for grunt work, the economics of multi-agent collaboration are finally becoming viable.

What this is

A workflow has recently gained popularity in the open-source community. Its core logic is to make OpenAI's Codex the "boss," responsible for breaking down tasks and reviewing results, while Claude Code taps into DeepSeek's cheap API as the "employee," handling high-consumption execution chores like reading code and running tests.

We have noticed that in the past, when developers used Agents (AI systems capable of autonomously perceiving their environment and executing tasks) to write code, they often had a single model do everything from start to finish. This resulted in the main thread being clogged with failure logs and intermediate code, leading to staggering consumption of Tokens (the basic billing unit for large models, roughly equivalent to one Chinese character or half an English word). This new model splits the task: high-cost decision-making stays with the main model, high-frequency execution is offloaded to the cheap model, and the main model is only responsible for final acceptance and fallback.

Industry view

We believe the popularity of this "contractor crew" model marks a shift in how developers apply large models, moving from "competing on capability" to "calculating costs." Concentrating the computing power of expensive models on global judgments rather than wasting it on sifting through logs is an extremely pragmatic approach to resource scheduling.

But it is worth warning that multi-agent collaboration is not a silver bullet. Critics point out that this model places extremely high demands on the "boss" model's ability to understand instructions. If the main model's task breakdown is unclear, the sub-agents will wildly output in the wrong direction, actually generating more invalid consumption. At the same time, connecting multiple nodes in series increases debugging difficulty; if an intermediate step fails, the troubleshooting cost is far higher than with a single-model loop. The hidden management costs brought by system complexity are easily masked by the initial thrill of saving money.

Impact on regular people

For enterprise IT: Purchasing AI tools is no longer about blindly choosing the most expensive model; instead, it requires establishing a hybrid invocation mechanism to dynamically allocate computing budgets based on task attributes.

For the individual workplace: Developers' work is accelerating towards a transition into "AI foremen." Knowing how to break down requirements and validate AI output is becoming more important than the ability to write code personally.

For the consumer market: The price war at the underlying API layer is drastically reducing the deployment costs of complex, multi-step AI applications. Consumers can expect to access deep intelligent assistant services at lower prices.