Thinking Machines previews a full-duplex AI model that talks back faster

Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, is showing off a different kind of AI: one that can listen and speak at the same time. The company says its new full-duplex interaction models are built around behavior that does not wait for you to stop talking before it starts forming a reply. In theory, that pushes AI from chat window awkwardness toward something closer to an actual conversation.

The headline number is a response time of 0.40 seconds for TML-Interaction-Small, which the company says is roughly in line with natural human conversation and faster than comparable models from OpenAI and Google. That is a strong claim for a research preview, and a familiar one in AI: the demo is always smoother than the product. Still, the underlying idea is sensible. Voice assistants and chatbots have long treated turn-taking as an afterthought, even though that is where most of the friction lives.

What Thinking Machines is actually testing

This is not a public release yet. Thinking Machines says a limited research preview is coming in the next few months, with broader access planned later this year. That puts the company in a very common AI position: enough proof to attract attention, not enough product for anyone to really kick the tires.

The larger trend here is obvious even if the branding is a little theatrical. AI labs are moving beyond raw text generation and into interaction design, because speed alone is not enough if the system still feels like a laggy typist. OpenAI, Google, and others have all been racing toward more natural voice and multimodal experiences; Thinking Machines is arguing that the architecture itself should be built for overlap, interruption, and back-and-forth from the start.

Why full-duplex matters for AI assistants

It lets the model process input while generating output.
It makes conversation feel less like dictation software and more like a call.
It could reduce the dead air that makes many AI assistants feel slow, even when the answer is technically good.

If the preview holds up outside a demo, the payoff could be less about raw benchmark bragging and more about habit change. People tolerate a lot from AI, but they are much less patient with awkward timing. A system that can respond while you are still speaking would not just sound smarter; it would feel less like software pretending to be conversational.

The real test starts with users

The unanswered question is whether that 0.40-second claim survives messy real life: accents, interruptions, background noise, and users who change their minds halfway through a sentence. That is where many AI features go to die, usually after the launch hype has already done its job. If Thinking Machines can make the experience feel natural instead of merely fast, it will have found something the bigger labs also want.

What Thinking Machines is actually testing

Why full-duplex matters for AI assistants

The real test starts with users

Leave a comment