Microsoft tracks tokens per joule

- Microsoft and other big tech groups are tightening internal AI use in May 2026 after agent-heavy workflows pushed cloud spending higher. - Microsoft executives are now watching “tokens per joule,” while separate reporting said some agentic workflows used up to 1,000 times more tokens. - Microsoft’s own earnings materials and recent reporting point to more metered billing, governance and usage tracking across AI products.

Microsoft’s latest internal AI cost debate is not just about model quality. It is about how much work an agent does, how many tokens it burns to do it, and how much electricity and infrastructure sit behind that output. Recent reporting has tied that shift to rising internal usage bills across large technology companies, while Microsoft has also been describing a different yardstick in public: tokens per watt per dollar, or, in other accounts, tokens per joule. That combination matters because it links two layers of the same problem. One is the software bill created when employees and teams run longer, more autonomous AI workflows. The other is the physical efficiency of the systems producing those outputs in Microsoft’s data centers. Microsoft has not publicly framed the issue as a retreat from AI, but its own disclosures and adjacent reporting show a company paying closer attention to how AI usage translates into cost. (tech.yahoo.com) Thread: 1/ Microsoft’s emerging metric is simple to state and hard to ignore: how many useful AI tokens a system can produce for a given amount of energy and infrastructure cost. In Microsoft’s fiscal 2026 second-quarter earnings call, the company said the key metric it is optimizing for is “tokens per watt per dollar,” tying AI performance directly to utilization and total cost of ownership. (microsoft.com) 2/ Separate reporting in May 2026 said Microsoft executives were increasingly focused on “tokens per joule,” a closely related way of asking the same question: not just can a model run, but can it run economically at scale. DigiTimes described the metric as one Microsoft is using to distinguish viable AI systems from hype. 3/ The pressure behind that shift comes from agentic AI. Reporting aggregated by Tom’s Hardware and Yahoo said companies including Microsoft, Meta and Amazon have been rethinking internal AI usage after employee “tokenmaxxing” and agent-style workflows drove up bills. (microsoft.com) Those reports said some agentic tasks consumed as much as 1,000 times more tokens than standard AI jobs. (digitimes.com) 4/ “Tokenmaxxing” refers to employees inflating AI usage metrics by running unnecessary or low-value tasks. Reporting on Amazon described weekly adoption targets for more than 80% of developers and internal leaderboards tracking token use, with some workers saying colleagues were using tools to maximize token counts rather than business value. Similar behavior has been reported around other large tech employers. (tech.yahoo.com) 5/ The core problem is that agents compound usage. A standard chatbot request may involve one prompt and one response. An agent can plan, call tools, inspect files, retry failed steps, spawn subtasks and produce long traces. Each of those steps can add input and output tokens, and every retry can multiply cost again. That is an inference based on how metered token billing works and on reporting about higher-cost agent workflows. (letsdatascience.com) 6/ Microsoft’s broader product moves fit that picture. Reporting in April said GitHub Copilot was moving toward token-based or credit-based billing because absorbing rising inference costs was “no longer sustainable.” That does not prove a direct link to internal pullbacks, but it does show Microsoft pushing pricing and governance closer to actual model consumption. (wheresyoured.at) 7/ The practical consequence is that AI teams are being pushed toward bounded agents rather than open-ended ones. If every extra tool call, retry loop and verbose output has a visible cost, then routing smaller jobs to cheaper models, limiting recursion, caching repeated work and tracing usage become management issues, not just engineering preferences. That is an inference from Microsoft’s stated efficiency focus and the reported rise in token-heavy workflows. (theregister.com) 8/ Microsoft is also building more formal oversight around AI deployment. CNBC reported on May 23 that Jenny Lay-Flurrie, who became head of Microsoft’s Trusted Technology Group in February, described responsible technology as both “how do we build it right?” and “how do we keep it that way?” That language points to continuous monitoring, not one-time launches. (microsoft.com) 9/ Read together, the reporting suggests the industry is moving from celebrating agent autonomy to measuring agent efficiency. The question is no longer only whether an agent can complete a task. It is whether the task justifies the tokens, electricity and infrastructure it consumes. Microsoft’s own earnings language shows that calculation is already part of how it talks about AI operations. (cnbc.com) 10/ The next places to watch are Microsoft disclosures on AI infrastructure efficiency, GitHub Copilot billing changes and any further reporting on internal usage controls at Microsoft, Meta and Amazon. Microsoft’s investor materials and product pricing updates are likely to show where these cost metrics become policy. (microsoft.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.