Thinking Machines launches interaction models

- Thinking Machines Lab, Mira Murati’s startup, unveiled “interaction models” on May 11 — a research preview built for live audio, video, and text collaboration. (thinkingmachines.ai) - The concrete hook is speed and structure: 200 ms micro-turns, full-duplex input and output, and a limited preview first, with broader release planned later this year. (thinkingmachines.ai) - The bigger shift is strategic: the lab is arguing interactivity itself should scale with intelligence, which could reshape how realtime AI gets built and measured. (thinkingmachines.ai)

Realtime AI is the domain here — and the fight is over what “good” should mean. For the last two years, most frontier systems have been getting better at reasoning, coding, and long tasks, but the actual interaction still feels like email with a typing indicator. (thinkingmachines.ai) You speak, the model waits. The model speaks, you wait. Thinking Machines Lab says that basic pattern is the bottleneck, and on May 11 it used its first major research preview to push a different idea: models that handle interaction natively across audio, video, and text instead of faking it with software wrapped around a turn-based core. ### What did Thinking Machines actually launch? It launched a research preview of what it calls “interaction models” — a new class of multimodal systems meant to collaborate in real time, not just answer after the fact. (thinkingmachines.ai) The company frames this as a model architecture change, not just a nicer voice mode. The idea is that the system should be able to take in signals continuously, think while new signals are still arriving, and respond without forcing the human to stop and package every thought into a neat prompt. ### Why is turn-taking the problem? Because real collaboration is messy. People interrupt, backchannel, point at screens, correct themselves, and react to visual context while someone else is still talking. (thinkingmachines.ai) Thinking Machines argues today’s models freeze perception while generating output, which means the human has to adapt to the machine’s rhythm. That works for chat. It works less well for tutoring, pair programming, meetings, live support, or anything that depends on timing and shared attention. ### What is the technical trick? The company says it trains the model from scratch with a multi-stream, micro-turn design. In plain English, the model handles tiny slices of interaction — 200 millisecond chunks — across different streams instead of waiting for one long serialized turn to end. (thinkingmachines.ai) VentureBeat’s writeup describes this as full-duplex behavior: the model can listen, talk, and see at the same time. That is closer to a walkie-talkie with overlap than a chatbot with a submit button. ### Why does that matter beyond demos? Because latency changes behavior. A smart model that answers half a second too late can feel clumsy in conversation, even if the underlying reasoning is strong. (thinkingmachines.ai) Thinking Machines is basically saying intelligence and responsiveness should not be traded off as separate product layers. If interactivity is baked into the model, then scaling the model should improve both how well it thinks and how naturally it collaborates. That is the core thesis. ### Is this a product launch? Not yet in the normal sense. The company says this is a research preview, with a limited preview planned in the coming months and a wider release later in 2026. (thinkingmachines.ai) So the news is less “here is the app you can use today” and more “here is the architecture direction this lab wants to define.” That still matters, because Thinking Machines has been relatively quiet about what it was building. This is the clearest statement so far. ### What’s the business angle? Benchmarks. If you claim interactivity is a first-class capability, you need ways to measure turn-taking, grounding, interruption handling, recovery after mistakes, and long-lived exchanges. (thinkingmachines.ai) Existing evals mostly reward static answers. That opens a real market around rubric design, human-in-the-loop testing, synthetic interaction harnesses, and enterprise acceptance tests for live AI systems. The infrastructure around the model could become almost as important as the model. ### So what should readers take away? This is a bet on a different center of gravity for AI. Not just smarter outputs — better shared timing. If that bet lands, the winners in realtime AI may not be the labs with the flashiest voice demo, but the ones that make interruption, perception, and recovery feel natural enough that people stop noticing the interface at all. (thinkingmachines.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.