Models behaving like interns

OpenAI’s chief scientist said current models are approaching the capability of a human research intern, and separate reporting shows Codex solved multi‑step tool use inside Adobe Lightroom without special plugins—evidence that models are getting better at real software workflows. (businessinsider.com) (businessinsider.com) This shift matters because it signals vendor focus on agents that can act across messy toolchains, not just answer prompts.

OpenAI’s chief scientist, Jakub Pachocki, said current artificial intelligence models are getting close to the level of a human research intern, which is a much narrower and more concrete claim than “human-level intelligence.” He was describing systems that can gather information, follow instructions, and produce usable drafts, but still need supervision. (businessinsider.com) That “intern” comparison matters because an intern does not just answer one question. A research intern is expected to read messy material, pull out the useful parts, and hand back something a manager can check and improve. (businessinsider.com) A separate Business Insider report described OpenAI’s Codex completing a multi-step task inside Adobe Lightroom, which is Adobe’s photo-editing software, without a special Lightroom plug-in. The model reportedly used the software the way a person would, by working through the interface instead of calling a custom shortcut behind the scenes. (businessinsider.com) That is a different skill from writing a neat answer in a chat box. Software like Adobe Lightroom hides actions inside menus, sliders, panels, and changing screen layouts, so the hard part is often finding the right button in the right order. (businessinsider.com) OpenAI has been building directly toward that kind of work. Its Computer Use tool lets a model click, type, scroll, and inspect screenshots, and OpenAI says developers should run it in an isolated browser or virtual machine with a human reviewing high-impact actions. (openai.com) OpenAI made the same point when it introduced Operator in January 2025. The company said Operator’s Computer-Using Agent was trained to interact with graphical user interfaces, meaning the buttons, menus, and text fields people see on a screen. (openai.com) Codex started as a software engineering agent, not a general desktop robot. OpenAI’s launch post said each Codex task runs in its own cloud sandbox and can write features, fix bugs, answer questions about a codebase, and propose pull requests for review. (openai.com) Since then, OpenAI has kept adding more “coworker” behavior around Codex. The company’s October 6, 2025 general availability release added a Slack integration, a software development kit, and admin tools for teams, which makes Codex look less like autocomplete and more like a worker that can be assigned jobs. (openai.com) OpenAI’s own developer docs now define agents as systems that plan, call tools, collaborate across specialists, and keep enough state to finish multi-step work. That is almost the job description for a junior employee who has to use several apps to get one thing done. (openai.com) The jump from “good at prompts” to “good at workflows” is the real story here. A model that can move through a cluttered tool chain, recover when a screen changes, and finish a task across several steps is much closer to replacing the first draft of office work than a model that only writes polished paragraphs. (openai.com)

Models behaving like interns

Get your own daily briefing