Google’s Gemini Omni Flash turns text and photos into video

Google has launched Gemini Omni, a new family of generative AI models built to create content from text, photos, audio, and other inputs. The first model, Gemini Omni Flash, can generate short videos from those sources and edit existing clips through natural-language prompts, putting Google in a tighter race with OpenAI, Runway, and ByteDance’s video tools.

The pitch is simple enough: give the model almost anything, and it will try to turn it into a coherent scene. The harder part is making the physics believable, the motion consistent, and the output useful beyond novelty. Google says Omni leans on Gemini’s broader world knowledge to keep gravity, liquid movement, and scene logic intact, which is exactly the sort of brag every video AI company has to make now.

What Gemini Omni Flash can do

Google says Gemini Omni Flash can transform one video into another, not just spit out a fresh sequence from a text prompt. That matters because editing is where generative video starts to look less like a demo and more like a workflow, especially if the system preserves character actions and scene continuity across follow-up requests.

At the moment, the system can produce videos with sound up to 10 seconds long, according to Dumitru Erhan, a senior director of research at Google DeepMind. That ceiling is still short, but it fits the wider pattern in AI video: first come the clipped, polished samples, then the longer runtimes that turn spectacle into something creators can actually build with.

Avatars, voice and SynthID labels

The model also lets users create a digital avatar and speak through it using their own voice. Google says that sort of personalisation was heavily requested in its image model Nano Banana, which the company says produced more than 50 billion images. That kind of number tells you two things: people love putting themselves into generated content, and platforms love anything that keeps them inside the app.

Safety controls are doing some heavy lifting too. Google is limiting the model’s ability to alter someone else’s speech in video, and all generated clips are automatically marked with an invisible SynthID watermark for authenticity checks. That is the right move, though not exactly a glamorous one; the industry has learned the hard way that AI video without provenance is a fast track to nonsense.

Gemini Omni Flash availability in Gemini, Flow and YouTube apps

Gemini Omni Flash is already available globally for subscribers to Google AI Plus, Pro and Ultra through the Gemini app and Google Flow. Starting this week, free access is also opening up in YouTube Shorts and the YouTube Create App, which is a clear attempt to push the model where everyday creators already spend their time.

Google still plans to add audio output and static image generation later. If the company delivers longer clips without breaking scene logic, Omni could move from a showcase feature to a real competitor in short-form video production. If it cannot, the industry will keep treating it the same way it treats most AI video launches: impressive, briefly, and then immediately compared with the next one.

Source: 3dnews

What Gemini Omni Flash can do

Avatars, voice and SynthID labels

Gemini Omni Flash availability in Gemini, Flow and YouTube apps

Leave a comment