Study: remote coding agents boost output 17.3×

- Ethan Mollick on June 2 highlighted an NBER working paper that tracked more than 100,000 GitHub developers across successive generations of AI coding tools. (nber.org) - The paper’s core finding was a widening gap between coding activity and shipped software: autonomous coding agents lifted commits 180%, while releases rose 30%. (nber.org) - The underlying study is NBER Working Paper 35275, “Writing Code vs. Shipping Code,” by Mert Demirer, Leon Musolff and Liyuan Yang. (nber.org)

Ethan Mollick on June 2 pointed to a new economics paper with a simple headline result: newer AI coding tools are associated with much larger increases in code output than earlier ones, but much smaller gains in software that actually ships. Mollick’s post summarized the progression as 2.2x for autocomplete tools, 7.4x for local agents and 17.3x for remote coding agents, and said releases rose only about 30% because human review and other downstream steps remained constraints. (nber.org) The underlying paper is an NBER working paper dated May 2026, “Writing Code vs. Shipping Code: Productivity Effects Across Generations of AI Coding Tools,” by Mert Demirer, Leon Musolff and Liyuan Yang. (nber.org) The authors said they used data on more than 100,000 GitHub developers combined with AI usage telemetry to study how successive generations of tools changed software output. ### Which paper is Mollick pointing to? NBER Working Paper 35275 asks how productivity effects evolve across generations of AI coding tools and how much of those gains reach final output. The authors grouped tools into three generations: autocomplete, interactive coding agents and autonomous coding agents. (techtwitter.com) Mollick’s June 2 post translated those categories into a more familiar product framing for online readers: autocomplete tools such as Copilot, local agents such as early Claude Code, and current remote coding agents. That wording is Mollick’s summary, not the paper’s formal taxonomy. (nber.org) ### What did the study actually measure? The NBER paper said it used a matched event-study design and tracked outcomes across a production hierarchy rather than stopping at raw coding activity. Its abstract says autocomplete, interactive coding agents and autonomous coding agents increased coding activity, measured as commits, by cumulative effects of 40%, 140% and 180%, respectively. (nber.org) Those percentages are not the same metric as Mollick’s 2.2x, 7.4x and 17.3x figures. Based on the paper excerpt available, the authors’ published abstract emphasizes cumulative effects on commits, projects and releases, while Mollick’s post presents a separate output framing drawn from the research. (techtwitter.com) That appears to be an interpretation or additional calculation from the underlying study materials rather than the wording in the abstract itself. ### Why did releases rise far less than coding output? The paper’s most cited number is the drop-off from code-writing to shipping. The authors found that the 180% cumulative effect for autonomous coding agents fell to 50% for the number of projects and to 30% for actual releases. (nber.org) The authors said that pattern was consistent with a “weak-link hypothesis,” under which strong gains in one stage of production are limited by slower stages elsewhere. They estimated an elasticity of substitution of 0.25 between AI and human effort, which they said indicated strong complementarities between the two. (nber.org) In plain terms, the study says AI can accelerate writing code faster than organizations can review, integrate, approve and ship it. The paper also said it checked outcomes across four major app marketplaces and found a moderate increase in the number of new apps but no increase in total usage. (nber.org) ### Does other research line up with that pattern? A January 2026 paper from Carnegie Mellon researchers also found that coding agents can raise development velocity while increasing quality risks. That study reported that repositories adopting agent-generated pull requests saw large early throughput gains in some settings, but also persistent increases in static-analysis warnings and cognitive complexity of about 18% and 39%. (nber.org) That is a different dataset and a different method, but it points in the same direction: more output does not automatically mean cleaner code or more shipped, used software. ### Where does the research go next? (nber.org) The NBER paper is dated May 2026 and identified as a working paper, which means it is circulating for discussion and has not been peer reviewed through the NBER board process. The next concrete reference point is whether the authors release fuller tables or a journal version clarifying how the output multipliers cited by Mollick map onto the abstract’s commit, project and release measures. (nber.org) (arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.