Google has introduced its latest video-generating AI model, Veo 3, which can now create audio to complement the clips it generates. During the Google I/O 2025 developer conference, Veo 3 was revealed, boasting the ability to produce sound effects, background noises, and even dialogue to enhance the videos it creates. According to Google, Veo 3 surpasses its predecessor, Veo 2, in terms of the quality of footage it can generate.
Enhanced Features of Veo 3
Veo 3 is now available in Google’s Gemini chatbot app for subscribers to Google’s $249.99-per-month AI Ultra plan. Users can prompt Veo 3 with text or an image, allowing them to describe characters, environments, and suggest dialogue to customize the sound of the generated clips.
Audio Output Differentiation
One of Veo 3’s standout features is its ability to automatically sync generated sounds with video clips by understanding the raw pixels from the videos. This unique feature can potentially set Veo 3 apart from other video-generating models in the market.
Here’s a sample clip generated by Veo 3:
(cooking up something tasty for tomorrow… pic.twitter.com/wyIRMsXkFG — Demis Hassabis (@demishassabis) May 19, 2025)
Veo 3’s development was influenced by DeepMind’s work in “video-to-audio” AI, where AI technology was used to create soundtracks for videos. To prevent deepfakes, DeepMind has employed its proprietary watermarking technology, SynthID, to embed invisible markers into the frames generated by Veo 3.
While companies like Google promote Veo 3 as a powerful creative tool, there are concerns within the artistic community about the potential disruption it could cause to industries. With AI advancements, the Animation Guild estimates that over 100,000 U.S.-based film, television, and animation jobs could be impacted by 2026.
Google has also announced new capabilities for Veo 2, including features that enhance user control over the generated videos. These capabilities will soon be available on Google’s Vertex AI API platform in the upcoming weeks.
