Odd Lots dissects viral AI chart
- Bloomberg’s Odd Lots published an April 25 episode and article unpacking the AI chart from Model Evaluation & Threat Research, or METR, that tracks how long frontier models can finish human tasks. - The episode centers on METR’s “50% time horizon” measure and a fresh data point: Claude Opus 4.6 can complete tasks that take skilled humans nearly 12 hours, Bloomberg said. - The chart has become a shorthand for AI progress debates as new models arrive and forecasts shift from benchmark scores to real-world task length. (bloomberg.com)
Bloomberg’s Odd Lots spent its April 25 episode on one AI graphic: METR’s chart of how long frontier models can finish real tasks without human help. (bloomberg.com) (iheart.com) METR stands for Model Evaluation & Threat Research, a nonprofit that tests whether an artificial intelligence agent can complete multi-step work a human expert would normally finish in minutes or hours. (metr.org 1) (metr.org 2) Its key metric is the “50% time horizon,” which means the length of a task that a model can complete correctly half the time. METR’s March 2025 paper said frontier systems had been doubling that task length about every seven months since 2019. (metr.org) (arxiv.org) Odd Lots brought on METR president Chris Painter and technical staff member Joel Becker to explain what that chart is actually measuring and what it is not. Bloomberg’s write-up said the discussion focused on the mechanics and philosophy behind the benchmark. (bloomberg.com) (iheart.com) The new hook was Claude Opus 4.6. Bloomberg’s episode description said the model could do a task that would take a human nearly 12 hours, a level far above the roughly 50-minute horizon METR reported for models such as Claude 3.7 Sonnet in its 2025 paper. (bloomberg.com) (arxiv.org) That is why the chart keeps circulating each time OpenAI, Anthropic or Google releases a new flagship model. MIT Technology Review called it a “now-iconic graph” that has become central to arguments about how fast autonomous AI is improving. (technologyreview.com) (metr.org) The chart is also contested. MIT Technology Review reported that critics argue people often read it as a clean forecast of artificial general intelligence, while METR’s own paper says the estimates have wide uncertainty and are limited to the task sets and agent setups tested. (technologyreview.com) (metr.theo-bearman.com) Odd Lots’ contribution was less a new benchmark than a translation exercise for investors and general readers. It took a graph that usually lives inside AI labs and safety circles and put Chris Painter and Joel Becker on mic to explain why a line that goes up can still be easy to misuse. (bloomberg.com) (iheart.com)