Someone using Codex to call GPT-5.5-medium unexpectedly intercepted the model's internal Chain of Thought (the intermediate reasoning process before the model delivers its final answer)—the output was entirely telegraphic short sentences, virtually identical to the "use caveman language for CoT to save inference costs" proposal from the r/LocalLLaMA community 5 months ago. We judge that OpenAI has baked inference token compression into the model's underlying logic.
What this is
The leaked output looks like this: "Need absolute path. Need know cwd absolute. v:... Use markdown. final with path. Need avoid bogus path." Extremely short subject-verb structures, no articles, no modifiers—like a caveman speaking. This isn't a bug; it's GPT-5.5 actively choosing compressed expression during internal reasoning. Five months ago, a post on r/LocalLLaMA proposed: forcing AI to use rudimentary language for Chain of Thought reduces token count, thereby lowering inference costs. Now OpenAI appears to have productized this community idea.
Industry view
The stronger the reasoning ability, the longer the Chain of Thought, the more ferocious the token consumption—this is the shared cost bottleneck for all reasoning models since o1. Using compressed language for internal reasoning is currently the most pragmatically engineered cost-saving solution; it requires no architecture changes, only a shift in expression habits. But opposing voices are equally clear: excessive compression may cause loss of critical reasoning steps, especially in math or tasks with long logic chains—the equilibrium between "saving tokens" and "preserving quality" is extremely difficult to nail. More ironically, Chain of Thought was supposed to be the selling point for improved interpretability—compressed into cipher-like short phrases, interpretability actually gets worse.
Impact on regular people
For enterprise IT: Inference cost is the largest operational expenditure in deploying Agents. Chain of Thought compression means the same budget can run more inference tasks, directly improving ROI.
For individual careers: In the future, you'll find AI products' "thinking process" increasingly hard to read—this isn't regression; it's deliberate compression.
For the consumer market: The output users ultimately see remains unchanged; the change is all inside the model's head. But lower inference costs could spawn cheaper subscription plans.