Google launches Gemini Omni Flash, a conversational video-generation model with avatar mode held back

SiliconFeed EditorialMay 20, 2026

google gemini video generation ai deepmind

Sections and tags — in the Topics menu Search the feed

At a glance:

Gemini Omni Flash rolls out on the Gemini app, Google Flow and YouTube Shorts for free, with API access coming weeks later
The model can ingest any mix of image, audio, video and text, but speech‑editing and general audio editing are withheld for now
All generated clips are watermarked with Google’s SynthID; avatar creation requires voice recording and number‑reading onboarding

What Google announced at I/O 2026

Google DeepMind unveiled the first member of its Omni family, Gemini Omni Flash, during the I/O 2026 developer conference. The multimodal model is billed as able to generate and edit video from any combination of image, audio, video, and text inputs. Koray Kavukcuoglu, CTO of DeepMind and Chief AI Architect at Google, emphasized that the system “combines images, audio, video, and text as input and generates high‑quality videos grounded in Gemini’s real‑world knowledge.” The rollout began the same day on the Gemini app, Google Flow for Google AI Plus/Pro/Ultra subscribers, and on YouTube Shorts and the YouTube Create app, all at no cost to users.

How the Omni model works

Omni’s core innovation lies in its conversational‑editing layer. Users can issue a series of instructions, each building on the previous one, and the model preserves character identity, physics, and scene context across turns. According to the blog post, the model shows an “improved intuitive understanding of physical forces such as gravity, kinetic energy, and fluid dynamics,” enabling more realistic motion. It also taps into Gemini’s extensive world knowledge, allowing prompts that blend scientific explanation (e.g., a clay‑mation protein‑folding explainer) with creative visual storytelling.

Avatar mode and the withheld audio features

Alongside video generation, the Omni family now includes a digital‑avatar capability. Users record their own voice and speak a series of numbers aloud to create a personalized avatar that can appear and sound like them in generated clips. However, Google is deliberately withholding general‑purpose audio and speech editing for the time being. Kavukcuoglu wrote that the company is still testing these features to ensure responsible deployment, a move interpreted as a precaution against consent‑free deep‑fake misuse.

SynthID watermarking and provenance

Every clip produced by Omni carries Google’s SynthID imperceptible digital watermark by default. The watermark can be verified through the Gemini app, Gemini in Chrome, or Google Search. SynthID uses the same C2PA‑based infrastructure that OpenAI adopted earlier in the year, positioning it as a cross‑industry default for AI‑generated visual provenance.

Limits, pricing unknown and competitive landscape

At launch, Flash‑tier clips are capped at 10 seconds, a deployment decision rather than a model limitation. This is shorter than OpenAI’s Sora, which allows up to 60 seconds per clip. Google has not disclosed the per‑clip cost, compute footprint, or benchmark scores against rivals such as ByteDance’s Seedance or DeepMind’s own Veo 3. Analysts note that the key upcoming proof point will be the API rollout to developers and enterprises, where pricing and longer‑clip tiers will become public.

Looking ahead: what remains undisclosed

The announcement left several details vague: the exact architecture of Omni relative to Veo 3, the compute cost per generation, benchmark results, and the timeline for enabling full‑scale audio and speech editing. The strategic question remains whether Omni defines a new product category—multimodal, conversational video editing—or simply integrates existing frontier‑video capabilities into a tighter Google ecosystem. The next weeks, once the API is live, will likely clarify the model’s commercial positioning and its impact on the rapidly evolving AI video market.

Editorial SiliconFeed is an automated feed: facts are checked against sources; copy is normalized and lightly edited for readers.

FAQ

When will developers be able to access Gemini Omni Flash via API?

Google said API access for developers and enterprise customers will be available in the coming weeks after the initial consumer rollout. The exact date has not been announced, but the rollout will coincide with the broader Gemini API expansion announced at I/O 2026.

What are the current limitations on video length and pricing for Gemini Omni Flash?

At launch, Flash‑tier clips are limited to 10 seconds each. Google has not disclosed the per‑clip pricing or the compute cost, and it has not revealed how longer clips will be priced under paid tiers. These details are expected to be revealed when the API becomes publicly available.

How does Google ensure provenance for videos generated by Omni?

All Omni‑generated videos carry Google’s SynthID imperceptible digital watermark by default. Users can verify a clip’s origin through the Gemini app, Gemini in Chrome, or Google Search, leveraging the same C2PA‑based watermarking infrastructure adopted by OpenAI earlier this year.

More in the feed

Prepared by the editorial stack from public data and external sources.

Original article