Caveman trims AI replies to cut token bills

Developers are teaching chatbots to sound less like polite assistants and more like cave dwellers, and the reason is brutally practical: fewer tokens, lower bills. A tool called Caveman strips out greetings, filler, and softeners so models return only the useful bits – code, commands, URLs, and technical details – as companies scramble to control fast-rising AI costs.

The pitch is simple, almost offensively so. If a model can answer in half the words without losing meaning, it can also burn through a lot less compute. That has turned a stylistic hack into an operating expense strategy, which is how you end up with serious engineers trying to make artificial intelligence talk like it fell off a rock.

How Caveman cuts token use

Created by Julius Brussee, Caveman targets the bloated parts of model output that businesses rarely want but still pay for. In tests on Claude and Codex, the plug-in reportedly reduced generated tokens by 65-75%, with multiple compression levels for different use cases.

That kind of saving explains why the tool has already spread inside OpenAI, Nvidia, GitHub, and DEPT, according to Brussee. The awkward part for the industry is that the people building these models are now using extra tools to make them cheaper to use.

OpenAI, GitHub and terminal agents

GitHub records show OpenAI technical director Shayne Sweeney contributed to the Caveman repository and added support for Codex. The project has also evolved into a standalone terminal agent that uses almost twice as few tokens on similar tasks and works with OpenClaw.

Token reduction in Claude and Codex: 65-75%
Standalone terminal agent: almost twice as few tokens on similar tasks
Supported styles: from light compression to maximum brevity

Why companies are suddenly counting every token

Caveman fits a broader reset across enterprise AI budgets. Uber and Walmart have already limited employee use of AI tools, while Legrand has circulated guidance urging staff to curb model usage and switch to a ”caveman language” mode when possible. The message is hard to miss: if the model chatters, finance pays.

OpenAI chief executive Sam Altman has said that even prompts padded with ”please” and ”thank you” can add tens of millions of dollars in electricity costs. That helps explain why consulting firms such as Accenture are now selling services tied to tokenomics, a niche that only exists because every extra word has started to look like a line item.

AI brevity and lower inference costs

The bigger question is whether this becomes normal product design or just another cost-cutting trick for the enterprise crowd. If model providers keep chasing lower inference costs, expect more tools that compress language, strip politeness and make AI answers sound increasingly spare – efficient, yes, but not exactly charming.

How Caveman cuts token use

OpenAI, GitHub and terminal agents

Why companies are suddenly counting every token

AI brevity and lower inference costs

Leave a comment