Google launches Gemini Omni Flash, a conversational video-generation model with avatar mode held back
At a glance:
- Gemini Omni Flash rolls out on the Gemini app, Google Flow and YouTube Shorts for free, with API access coming weeks later
- The model can ingest any mix of image, audio, video and text, but speech‑editing and general audio editing are withheld for now
- All generated clips are watermarked with Google’s SynthID; avatar creation requires voice recording and number‑reading onboarding
What Google announced at I/O 2026
Google DeepMind unveiled the first member of its Omni family, Gemini Omni Flash, during the I/O 2026 developer conference. The multimodal model is billed as able to generate and edit video from any combination of image, audio, video, and text inputs. Koray Kavukcuoglu, CTO of DeepMind and Chief AI Architect at Google, emphasized that the system “combines images, audio, video, and text as input and generates high‑quality videos grounded in Gemini’s real‑world knowledge.” The rollout began the same day on the Gemini app, Google Flow for Google AI Plus/Pro/Ultra subscribers, and on YouTube Shorts and the YouTube Create app, all at no cost to users.
How the Omni model works
Omni’s core innovation lies in its conversational‑editing layer. Users can issue a series of instructions, each building on the previous one, and the model preserves character identity, physics, and scene context across turns. According to the blog post, the model shows an “improved intuitive understanding of physical forces such as gravity, kinetic energy, and fluid dynamics,” enabling more realistic motion. It also taps into Gemini’s extensive world knowledge, allowing prompts that blend scientific explanation (e.g., a clay‑mation protein‑folding explainer) with creative visual storytelling.
Avatar mode and the withheld audio features
Alongside video generation, the Omni family now includes a digital‑avatar capability. Users record their own voice and speak a series of numbers aloud to create a personalized avatar that can appear and sound like them in generated clips. However, Google is deliberately withholding general‑purpose audio and speech editing for the time being. Kavukcuoglu wrote that the company is still testing these features to ensure responsible deployment, a move interpreted as a precaution against consent‑free deep‑fake misuse.
SynthID watermarking and provenance
Every clip produced by Omni carries Google’s SynthID imperceptible digital watermark by default. The watermark can be verified through the Gemini app, Gemini in Chrome, or Google Search. SynthID uses the same C2PA‑based infrastructure that OpenAI adopted earlier in the year, positioning it as a cross‑industry default for AI‑generated visual provenance.
Limits, pricing unknown and competitive landscape
At launch, Flash‑tier clips are capped at 10 seconds, a deployment decision rather than a model limitation. This is shorter than OpenAI’s Sora, which allows up to 60 seconds per clip. Google has not disclosed the per‑clip cost, compute footprint, or benchmark scores against rivals such as ByteDance’s Seedance or DeepMind’s own Veo 3. Analysts note that the key upcoming proof point will be the API rollout to developers and enterprises, where pricing and longer‑clip tiers will become public.
Looking ahead: what remains undisclosed
The announcement left several details vague: the exact architecture of Omni relative to Veo 3, the compute cost per generation, benchmark results, and the timeline for enabling full‑scale audio and speech editing. The strategic question remains whether Omni defines a new product category—multimodal, conversational video editing—or simply integrates existing frontier‑video capabilities into a tighter Google ecosystem. The next weeks, once the API is live, will likely clarify the model’s commercial positioning and its impact on the rapidly evolving AI video market.
FAQ
When will developers be able to access Gemini Omni Flash via API?
What are the current limitations on video length and pricing for Gemini Omni Flash?
How does Google ensure provenance for videos generated by Omni?
More in the feed
Prepared by the editorial stack from public data and external sources.
Original article