GPT-5.5 CoT Leak: OpenAI Uses 'Caveman Language' to Slash Inference Costs

Someone using Codex to call GPT-5.5-medium unexpectedly intercepted the model's internal Chain of Thought (the intermediate reasoning process before the model delivers its final answer)—the output was entirely telegraphic short sentences, virtually identical to the "use caveman language for CoT to save inference costs" proposal from the r/LocalLLaMA community 5 months ago. We judge that OpenAI has baked inference token compression into the model's underlying logic.

What this is

The leaked output looks like this: "Need absolute path. Need know cwd absolute. v:... Use markdown. final with path. Need avoid bogus path." Extremely short subject-verb structures, no articles, no modifiers—like a caveman speaking. This isn't a bug; it's GPT-5.5 actively choosing compressed expression during internal reasoning. Five months ago, a post on r/LocalLLaMA proposed: forcing AI to use rudimentary language for Chain of Thought reduces token count, thereby lowering inference costs. Now OpenAI appears to have productized this community idea.

Industry view

The stronger the reasoning ability, the longer the Chain of Thought, the more ferocious the token consumption—this is the shared cost bottleneck for all reasoning models since o1. Using compressed language for internal reasoning is currently the most pragmatically engineered cost-saving solution; it requires no architecture changes, only a shift in expression habits. But opposing voices are equally clear: excessive compression may cause loss of critical reasoning steps, especially in math or tasks with long logic chains—the equilibrium between "saving tokens" and "preserving quality" is extremely difficult to nail. More ironically, Chain of Thought was supposed to be the selling point for improved interpretability—compressed into cipher-like short phrases, interpretability actually gets worse.

Impact on regular people

For enterprise IT: Inference cost is the largest operational expenditure in deploying Agents. Chain of Thought compression means the same budget can run more inference tasks, directly improving ROI.

For individual careers: In the future, you'll find AI products' "thinking process" increasingly hard to read—this isn't regression; it's deliberate compression.

For the consumer market: The output users ultimately see remains unchanged; the change is all inside the model's head. But lower inference costs could spawn cheaper subscription plans.

GPT-5.5 CoT Leak: OpenAI Uses 'Caveman Language' to Slash Inference Costs

What this is

Industry view

Impact on regular people

Related Reading

Developers Hunt Fully Offline AI Coding Tools: Code Privacy Anxiety Spreads

OpenAI, a16z Dark Money Funds Influencers to Hype China AI Threat

DeepSeek V4 #1 in China, 8 Months Behind US Frontier — Gap Narrows But Order Holds

Qwen3.6-27B Ties Coder-Next: Pick Models by Scenario, Not Benchmarks

Too Small a Niche? He Spent 6 Years Perfecting Watch Maps

AI Will Precisely Drop Databases Without Noticing—We Haven't Taught AI to Say No