Microsoft focuses on tokens per joule

- Microsoft executives are discussing “tokens per joule” in May 2026 as a way to judge AI systems by efficiency, not capability alone. (digitimes.com) - Microsoft Research said in April 2026 that agentic coding tasks can consume 1,000 times more tokens than code chat or reasoning. (microsoft.com) - Microsoft Build is scheduled for June 2-3, 2026, in San Francisco and online, where AI systems and scaling remain central topics. (build.microsoft.com)

Microsoft’s emerging “tokens per joule” language is a sign that the company’s AI conversation is moving from model spectacle to operating economics. DigiTimes reported on May 23 that Microsoft executives are using the phrase internally as a way to compare AI systems by how much useful output they produce for the energy they consume. (digitimes.com) The phrase matters because it combines two pressures that are already visible across Microsoft’s AI stack: token consumption and power demand. (microsoft.com) GitHub, which Microsoft owns, said its Copilot billing will move to usage-based pricing starting June 1, 2026, with charges tied to input, output and cached tokens. (build.microsoft.com) Microsoft has not publicly laid out a formal “tokens per joule” framework in its own documentation. But the company’s recent research and product pricing changes show the same underlying concern: how much compute an agent uses, how variable that usage is, and whether the result justifies the cost. (digitimes.com) ### Why would Microsoft need a metric beyond model quality? Microsoft Research published a paper in April 2026 on token consumption in agentic coding tasks that framed the problem directly. The paper said widespread use of AI agents in complex workflows is driving rapid growth in LLM token consumption and asked which models are more token-efficient. (docs.github.com) The same paper found agentic tasks were “uniquely expensive,” consuming 1,000 times more tokens than code reasoning and code chat, and said higher token use did not reliably produce higher accuracy. (microsoft.com) Runs on the same task could differ by as much as 30 times in total tokens, according to the paper. That makes an efficiency metric useful for a company selling enterprise AI tools, cloud infrastructure and coding assistants. If two systems complete the same task with different token and energy profiles, the cheaper and less power-hungry one can be easier to deploy at scale — that is an inference from Microsoft’s research and billing changes, not a direct company statement. (microsoft.com) ### What does “tokens per joule” actually try to capture? DigiTimes described the phrase as a way to separate practical systems from hype by looking at token and energy efficiency together. (microsoft.com) In plain terms, it points to output per unit of electricity rather than raw benchmark capability. The “token” side is already becoming a line item in Microsoft products. GitHub Docs say Copilot interactions consume input, output and cached tokens, and that those tokens are priced by model, with rates listed per 1 million tokens. (microsoft.com) The “joule” side reflects the physical constraints behind AI inference. A recent Joule paper said test-time scaling and agentic workflows are becoming routine and that long outputs can drive a disproportionate share of aggregate energy demand. ### Is Microsoft applying this only to agents? The strongest public evidence so far points to agentic systems. (geopoliticspulse.com) Microsoft Research’s April paper focused on agentic coding tasks and asked where agents spend tokens, which models are more token-efficient, and whether agents can predict token usage before execution. (docs.github.com) That focus fits the economics of agents. Unlike a single chatbot response, an agent may plan, call tools, retrieve context, retry steps and generate multiple intermediate outputs, each of which adds token and compute load. (cell.com) That workflow-level cost is the part a metric like “tokens per joule” would help expose. ### How does this fit with Microsoft’s broader AI posture? Jenny Lay-Flurrie, the head of Microsoft’s Trusted Technology Group, told CNBC on May 23 that responsible technology means asking, “How do we make sure that we build it right? (microsoft.com) And how can we make sure that it stays right?” That comment was about responsible technology broadly, not energy accounting specifically. But it shows Microsoft is publicly emphasizing governance and durability at the same time its research teams are publishing on token efficiency and its developer products are moving to usage-based billing. (microsoft.com) Microsoft Build is scheduled for June 2-3, 2026, in San Francisco and online. The event guide says attendees will go deep on “real code and real systems” with teams building and scaling AI at Microsoft, a setting where efficiency metrics and operating costs are likely to remain part of the discussion. (cnbc.com) (build.microsoft.com) (microsoft.com)

Microsoft focuses on tokens per joule

Get your own daily briefing