Jun 10, 2026 • 3 min read

Google’s Gemini 3.5 Live Translate targets real-time speech

Google has unveiled Gemini 3.5 Live Translate, an audio model built to translate spoken language almost continuously across more than 70 languages. The pitch is simple: fewer awkward pauses, less robotic phrasing, and a better shot at making live calls, meetings, and broadcasts sound like actual conversations instead of a machine waiting for permission to speak.

The model works speech to speech, processing audio as it arrives and producing translated speech a few seconds later. That is a small delay, but it is also the difference between a usable live translation tool and the old kind that turns every exchange into a stop-start relay race.

How Gemini 3.5 Live Translate handles speech

Google says the system preserves intonation, rhythm, and pitch while translating, which matters more than product demos usually admit. A flat voice can make even accurate translation feel awkward; keeping some of the speaker’s delivery helps the result sound closer to the original, especially in fast-moving settings like online meetings or live lectures.

The company also says the model can handle mixed-language input automatically, without manual language setup, and is designed to work in noisy, unpredictable acoustic conditions. That makes it more useful in the real world than a pristine demo room, where most translation tools have historically looked very brave and then fallen apart the moment a fan started humming.

Google Translate, Meet, and Android get the first rollout

Google Translate is getting the feature globally on Android and iOS when used with headphones. On Android, Google is also adding a listening mode that sends the translation straight to the phone speaker, making it feel more like a normal handset conversation than a translated audio loop.

Partners are already building on the API

Google is also exposing the model through Gemini Live API, which lets third-party developers build their own voice translation and dubbing products. Partners include Agora, Fishjam, LiveKit, Pipecat, and Vision Agents, while Grab is testing the technology for live voice communication between drivers and passengers in a multilingual environment with millions of calls each month.

That push suggests Google wants more than a flashy feature inside its own apps. It wants the infrastructure layer too, and that is where the money usually is. Voice translation is becoming a platform race, with rivals trying to own not just the consumer interface but the plumbing underneath it.

SynthID is the guardrail Google hopes will help

Google says the audio output includes SynthID, an embedded watermark meant to identify synthetic speech. That will not solve every misuse problem, but it is a sensible response to a category that can quickly drift from convenience tool to deepfake headache if the guardrails are flimsy.

The bigger question is adoption speed. If the translation feels natural enough in calls and meetings, users may stop thinking about the feature at all – which is usually the sign a language tool is finally doing its job.

Ava Chen

AI Editor

Ava covers the rapidly evolving world of artificial intelligence, from foundational models and research labs to the real-world economics of intelligence. With a background in computational linguistics, she cuts through the hype to find out what actually works. She firmly believes that benchmarks are just marketing until reproduced in the wild.

via ixbt.com

Google’s Gemini 3.5 Live Translate targets real-time speech

How Gemini 3.5 Live Translate handles speech

Google Translate, Meet, and Android get the first rollout

Partners are already building on the API

SynthID is the guardrail Google hopes will help

/ Keep reading

Anthropic revises context rules for Claude models

Flock cameras become targets of a national protest movement

Lenovo’s Legion Y7MG mouse weighs 59 grams