Google has unveiled Gemini 3.5 Live Translate, an audio model built to translate spoken language almost continuously across more than 70 languages. The pitch is simple: fewer awkward pauses, less robotic phrasing, and a better shot at making live calls, meetings, and broadcasts sound like actual conversations instead of a machine waiting for permission to speak.

The model works speech to speech, processing audio as it arrives and producing translated speech a few seconds later. That is a small delay, but it is also the difference between a usable live translation tool and the old kind that turns every exchange into a stop-start relay race.

How Gemini 3.5 Live Translate handles speech

Google says the system preserves intonation, rhythm, and pitch while translating, which matters more than product demos usually admit. A flat voice can make even accurate translation feel awkward; keeping some of the speaker’s delivery helps the result sound closer to the original, especially in fast-moving settings like online meetings or live lectures.

The company also says the model can handle mixed-language input automatically, without manual language setup, and is designed to work in noisy, unpredictable acoustic conditions. That makes it more useful in the real world than a pristine demo room, where most translation tools have historically looked very brave and then fallen apart the moment a fan started humming.

Google Translate, Meet, and Android get the first rollout

Google Translate is getting the feature globally on Android and iOS when used with headphones. On Android, Google is also adding a listening mode that sends the translation straight to the phone speaker, making it feel more like a normal handset conversation than a translated audio loop.

Google Meet is next in line for Workspace users, starting with a closed test for business customers before a wider release later in the year. That sequencing makes sense: enterprise video calls are where translation can save time immediately, and they also give Google a high-value proving ground before the feature spreads further.

Partners are already building on the API

Google is also exposing the model through Gemini Live API, which lets third-party developers build their own voice translation and dubbing products. Partners include Agora, Fishjam, LiveKit, Pipecat, and Vision Agents, while Grab is testing the technology for live voice communication between drivers and passengers in a multilingual environment with millions of calls each month.

That push suggests Google wants more than a flashy feature inside its own apps. It wants the infrastructure layer too, and that is where the money usually is. Voice translation is becoming a platform race, with rivals trying to own not just the consumer interface but the plumbing underneath it.

SynthID is the guardrail Google hopes will help

Google says the audio output includes SynthID, an embedded watermark meant to identify synthetic speech. That will not solve every misuse problem, but it is a sensible response to a category that can quickly drift from convenience tool to deepfake headache if the guardrails are flimsy.

The bigger question is adoption speed. If the translation feels natural enough in calls and meetings, users may stop thinking about the feature at all – which is usually the sign a language tool is finally doing its job.

Source: Ixbt

Leave a comment

Your email address will not be published. Required fields are marked *