Yann Dubois: AI progress feels real
- Yann Dubois, an OpenAI post-training leader, appeared in a May 21 podcast episode arguing that recent AI progress feels tangible when products become reliable. - The episode’s framing centered on “useful, reliable systems,” with Matt Turck describing a shift from raw model capability to product-grade deployment. - The full discussion is available on YouTube and podcast platforms, where Dubois and Turck discuss evals, hallucinations and agent workflows.
Yann Dubois used a May 21 podcast appearance to argue that recent advances in artificial intelligence feel more concrete because the systems around the models have improved alongside the models themselves. Dubois, who says on his personal site that he leads OpenAI’s Post-training Frontiers team, appeared on “The MAD Podcast” with investor Matt Turck in an episode titled “Why AI Progress Suddenly Feels Real.” The episode description says the conversation focused on the shift from “raw model capability” to “useful, reliable systems,” including evals, hallucinations, agentic workflows and continual learning. ### Why did Dubois say progress now “feels real”? Matt Turck’s May 21 episode description said AI “suddenly feels like it has crossed a threshold,” and framed the conversation around why that change is now visible to users. The description said Dubois discussed what changed with recent reasoning models and why post-training has become “one of the most important frontiers in AI.” (poddtoppen.se) That framing points to a distinction that has become more common in AI product discussions: capability inside the model is one thing, but user trust depends on whether the system behaves consistently in an application. The available public descriptions do not provide a full transcript, but they repeatedly describe the episode as a discussion of reliability, real-world utility and deployment rather than a single benchmark result. (poddtoppen.se) ### What role does Dubois play at OpenAI? Yann Dubois says on his personal website that he is an OpenAI researcher who leads the Post-training Frontiers team. He writes that the team trains “the agentic models shipped across Codex, the API, and ChatGPT Thinking/Pro,” and lists systems including o3, GPT-5 Thinking, GPT-5.3 Codex and GPT-5.5. The podcast listings describe him in similar terms, calling him co-lead of the Post-training Frontiers team and saying his group led post-training work behind OpenAI’s reasoning models, including GPT-5.5. (youtube.com) Those descriptions place the discussion in the part of the stack where model behavior is refined for deployment. ### Which engineering layers were emphasized? (yanndubs.github.io) The May 21 episode description named evals, “model-as-judge,” hallucinations, agentic workflows, GDPval and continual learning as central topics. A separate listing said the discussion covered the move from reinforcement learning on verifiable tasks such as math and coding into “messy, real-world utility.” (poddtoppen.se) Those topics are the layers that determine whether a model can be shipped in a product without breaking user expectations. Evaluation pipelines test regressions and failure modes. Latency and efficiency determine whether a feature is usable at scale. Observability and workflow tooling help teams see where an output came from, how expensive it was and where a system failed. That reading is an inference from the episode descriptions and from Dubois’s stated role in post-training and agentic model deployment. (poddtoppen.se) ### Why does that matter more than another benchmark? The podcast metadata said the episode examined “the difference between GPT-5.5 Thinking and GPT-5.5 Pro,” “how reasoning models like GPT-5.5 actually work,” and “the evaluation bottleneck.” Those references suggest the discussion was less about a single model launch than about the practical conditions that make model gains visible in products. (poddtoppen.se) Matt Turck’s posted chapter list also highlighted “model reliability,” “how reinforcement learning cures AI hallucinations,” and “why startups should focus on the last mile of AI.” That chapter list does not substitute for a transcript, but it shows that the episode was organized around deployment and reliability questions rather than pure research novelty. ### What comes next in this debate? The YouTube video and podcast listings remain the primary public record of the discussion as of May 23, 2026. (poddtoppen.se) The next step for readers is the source material itself: the May 21 episode on YouTube and podcast platforms, where Dubois and Turck lay out the case that post-training, evals and system reliability now shape how AI progress is experienced. (youtube.com) (digg.com)