Xiaomi opens OmniVoice, a 646-language voice-cloning model

Xiaomi has released OmniVoice as open source, and the pitch is hard to ignore: a voice-cloning model that can reproduce a speaker’s voice from a short sample, then speak in 646 languages. The company says the model is fast enough to generate speech 40 times faster than real time even without extra optimization, which is the kind of claim that makes both developers and rivals sit up straight.

The release includes the source code, model weights, and training data, so this is not a teaser or a lab demo wrapped in marketing fluff. It is also a direct shot across the bow of commercial speech tools that usually keep the best parts locked away.

What OmniVoice can do

OmniVoice is built around a simplified architecture, with large language model parameters used to improve quality. Xiaomi says that trade-off helped the system stay fast while still producing speech that sounds natural and intelligible.

Voice cloning from a short audio sample
Speech generation in 646 languages
Output 40 times faster than real time, without extra optimization
Text-based voice styling, including emotion cues like laughter or sighs

OmniVoice performance across 24 and 102 languages

In tests across 24 languages, OmniVoice beat a number of commercial systems on naturalness and clarity. Xiaomi also says that on 102 languages, the model got close to real recordings, which is a striking benchmark for a project that is openly available instead of hidden behind an API gate.

That matters because multilingual voice AI has become a bragging-rights race between big labs and product companies. Open releases like this tend to accelerate everything: research, copycats, and eventually the feature list in competing assistants from the likes of Google, OpenAI, and others that already treat speech as a battleground.

A bigger toolkit than simple dubbing

Xiaomi is not stopping at basic text-to-speech. The model can also tune a voice from a text description, strip noise automatically, and adjust the pronunciation of difficult words and names. In other words, it is aiming at the full messy reality of speech, not just polished lab audio.

The open question is how quickly developers will turn that capability into products people actually use. If OmniVoice holds up outside Xiaomi’s own testing environment, it could push more companies to open up their speech models too – or at least make them explain why their best versions are still behind a paywall.

What OmniVoice can do

OmniVoice performance across 24 and 102 languages

A bigger toolkit than simple dubbing

Leave a comment