Xiaomi’s MiMo model hits 1,000 tokens per second

Xiaomi has pushed its MiMo large language model family into eye-watering territory: MiMo-V2.5-Pro now has an UltraSpeed mode that the company says breaks the 1,000 tokens-per-second barrier. Built with TileRT and designed to run on general-purpose GPUs, the 1-trillion-parameter model is being pitched as a system-and-model co-design win rather than just a raw model upgrade. In plain English: Xiaomi wants bragging rights for speed, and it may actually have them.

The claim lands in a field where speed has become a status symbol again. OpenAI, Anthropic, and Google have spent the past year selling intelligence, but inference latency is now back in the spotlight because nobody enjoys waiting for an answer that could have been typed by a caffeinated intern. Xiaomi is clearly trying to turn that annoyance into a feature, and the numbers are aggressive enough to get noticed.

What Xiaomi says the UltraSpeed mode delivers

Xiaomi says UltraSpeed is about 10 times faster than standard MiMo-V2.5-Pro API access. It also dwarfs MiMo-V2-Flash, which Xiaomi says was already hitting 150 tokens per second when it launched in December 2025. That earlier model was fast enough to generate text faster than most people can read aloud; this one is trying to make the gap between thought and output feel embarrassing.

Model: MiMo-V2.5-Pro UltraSpeed
Scale: 1-trillion parameters
Speed claim: more than 1,000 tokens per second
Deployment: general-purpose GPUs

The price for speed is higher

Fast inference is never free, and Xiaomi is not pretending otherwise. The UltraSpeed API is priced at 3x the standard rate, while the regular MiMo-V2.5-Pro pricing is 0.025 yuan per million tokens on a cache hit, 3 yuan on a cache miss for input, and 6 yuan per million tokens for output. Xiaomi frames the trade-off as a ”3x price increase” for a ”10x output experience,” which is a neat way of saying that customers who care about throughput will pay up.

There is one catch: the Token Plan is not supported for UltraSpeed, and access is limited to API trial use. That makes this less of a consumer launch and more of a controlled showcase, which is exactly how companies tend to test expensive inference features before deciding whether the economics are real or just very impressive on a slide.

How Xiaomi is rationing access

Because high-speed inference resources are constrained, Xiaomi is running an application-based trial from June 9 to June 23, 2026. Approval is not guaranteed, and Xiaomi says it will prioritize enterprises and professional developers with genuine business needs. If that sounds selective, that is because it is: this is a capacity management exercise wrapped in a launch announcement.

Approved users get a two-week free Chat experience, but there are limits. Xiaomi says accounts are capped at 10 queue entries per day, sessions max out at 30 minutes, and idle resources are released automatically after 5 minutes. Those guardrails are less glamorous than the 1,000-tokens-per-second headline, but they are the part that usually decides whether a speedy model can survive outside a demo.

MiMo is becoming a broader platform

MiMo-V2.5-Pro itself launched in April 2026, and it sits inside a family Xiaomi says now spans text, voice, and multimodal capabilities. That matters because model speed alone is not a business strategy; the real prize is whether Xiaomi can turn MiMo into a platform developers keep coming back to instead of a one-off stunt built around a very fast number.

The next test is whether UltraSpeed stays a gated preview or becomes a practical option at scale. If Xiaomi can keep the latency low without pricing itself into a corner, the company may have found a rare advantage: not just a model that talks fast, but one that can make AI feel instant.

Source: 3dnews

What Xiaomi says the UltraSpeed mode delivers

The price for speed is higher

How Xiaomi is rationing access

MiMo is becoming a broader platform

Leave a comment