Token‑maxxing is trending
A thread by Gergely Orosz flagged ‘tokenmaxxing’ — teams optimizing code and systems for AI token usage and throughput — as a widespread developer trend at large tech firms. (x.com)
A token is the meter on an artificial intelligence system, like the little numbers on a taxi that keep climbing while the car is moving. In March 2026, The New York Times reported that engineers at Meta, OpenAI, and Shopify were already watching internal dashboards that ranked people by how many of those units they burned. (nytimes.com) (newsnationnow.com) That is why “tokenmaxxing” spread so fast: it turns invisible computer spending into a visible office score. One OpenAI engineer reportedly processed 210 billion tokens in a single week, which The New York Times said was enough text to fill Wikipedia 33 times. (nytimes.com) (deccanherald.com) The raw numbers got big because the new coding tools do not just answer one question and stop. Products like Claude Code can run for long stretches, read files, write code, call tools, and loop through fixes, so one engineer can light up millions or billions of tokens without sitting at the keyboard the whole time. (platform.claude.com) (techround.co.uk) The other reason is that the models now have much bigger short-term memory. Google says Gemini models can take in 1 million tokens of context, which it compares to about 50,000 lines of code or eight average-length English novels in one go. (ai.google.dev) Once models can swallow that much text, engineers start changing the way they work. Instead of carefully trimming a prompt down to the essentials, they can dump in whole repositories, years of chat logs, giant logs, and long specification documents and let the model sort through the pile. (ai.google.dev) That creates a weird split inside companies. Some teams are “tokenmaxxing” by pushing more context and more agent loops through the system, while other teams are doing the opposite and rewriting prompts, caches, and workflows so the same job uses fewer tokens and returns faster. (developers.openai.com) (platform.claude.com) (i-scoop.eu) The money is real even when the unit sounds abstract. OpenAI’s pricing page shows flagship models billed per 1 million tokens, and Anthropic’s pricing page does the same, with extra discounts for cached prompts because repeated context is expensive enough that both companies built special pricing around avoiding it. (developers.openai.com) (platform.claude.com) That is how a leaderboard turns into a budget problem. The same March 2026 reporting said one Claude Code user at Anthropic ran up more than $150,000 in a month, which is the kind of bill that forces finance teams to ask whether the model is doing useful work or just very busy-looking work. (nytimes.com) (economictimes.indiatimes.com) Gergely Orosz’s post landed because it named a behavior a lot of engineers were already seeing: once token counts become visible, people optimize for the meter. That can mean smarter system design and higher throughput, but it can also mean the artificial intelligence version of judging programmers by lines of code instead of whether the software actually works. (threadreaderapp.com) (itsmeduncan.com) The trend is probably not going away, because the big model companies keep making longer context windows and cheaper bulk usage available at the same time. When the memory gets bigger and the per-token price gets easier to justify, companies stop asking “can we afford to run this?” and start asking “how much can we push through it before the monthly bill shows up?” (ai.google.dev) (developers.openai.com) (platform.claude.com)