Google unveils Veo, a high-definition AI video generator that can compete with Sora.

to enlarge / Still images taken from videos produced by Google View.

Google / Benj Edwards

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

At Google I/O 2024 on Tuesday, Google announced Veo, a new AI video synthesis model that can create HD videos from text, image, or video prompts, similar to OpenAI's Sora. It can create 1080p videos lasting up to one minute and edit videos with written instructions, but it has not yet been released for widespread use.

Veo reportedly uses text commands to edit existing videos, maintain visual consistency across frames, and generate video sequences lasting up to 60 seconds or more from a single prompt or a series of prompts. Includes the ability to create a narrative. The company says it can create detailed scenes and apply cinematic effects such as time-lapse, aerial shots and various visual styles.

Since the launch of DALL-E 2 in April 2022, we've seen a parade of new image synthesis and video synthesis models aimed at allowing anyone who can type a written description to create a detailed image or Can make video. While neither technology has been completely perfected, both AI image and video generators are steadily becoming more capable.

Back in February, we covered a preview of OpenAI's Sora video generator, which many believed at the time represented the best AI video synthesis the industry could offer. It impressed Tyler Perry so much that he stopped expanding his film studio. However, until now, OpenAI has not provided general access to the tool—instead, they have limited its use to a select group of testers.

Now, Google's View seems at first glance to be capable of video generation like Sora. We haven't tried it ourselves, so we can only cherry-pick demonstration videos that the company has provided on its website. This means that the viewer should take Google's claims with a large grain of salt, as race results may not be generalizable.

Examples of Veo's videos include a cowboy on horseback, a high-speed shot down a suburban street, kebabs on the grill, time-lapses of sunflowers blooming, and more. Conspicuously missing are any detailed depictions of humans, which have historically been difficult for AI image and video models to produce without obvious distortions.

Google says Veo builds on the company's previous video generation models, including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet, and Lumiere. To increase quality and performance, Veo's training data includes more detailed video captions, and uses a compressed “latent” video representation. To improve the quality of Veo's video generation, Google added more detailed captions to the videos used to train Veo, allowing the AI ​​to more accurately interpret gestures.

Veo is also notable in that it supports movie-making commands: “When given both an input video and an editing command, such as adding kayaks to an aerial shot of the coastline, Veo executes that command. can apply to the initial video and create a new, edited video,” the company says.

While the demos look impressive at first glance (especially compared to Will Smith eating spaghetti), Google admits that creating AI videos is difficult. “Maintaining visual consistency can be a challenge for video generation models,” the company writes. “Characters, objects, or even entire scenes can flicker, jump, or morph unexpectedly between frames, disrupting the viewing experience.”

Google has tried to mitigate these drawbacks with “cutting-edge latent diffusion transformers,” which is basically meaningless marketing talk without explanation. But the company has enough confidence in the model that it's working with actor Donald Glover and his studio Gilga to create an AI-generated demonstration film that will debut soon.

Initially, Veo will be accessible to select creators through VideoFX, a new experimental tool available on, Google's AI test kitchen website. Creators can join a waiting list for VideoFX to potentially gain access to Veo's features in the coming weeks. Google plans to integrate some of Veo's capabilities into YouTube Shorts and other products in the future.

No word yet on where Google got the training data for Veo (if we had to guess, YouTube was involved). But Google says it's taking a “responsible” approach with Veo. According to the company, “Videos created by Veo are watermarked using SynthID, our advanced tool for AI-generated content watermarking and identification, and security filters and memory checks. undergo processes that help reduce privacy, copyright, and bias risks.”

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment