Google launches LLM to generate videos from text, audio input

Share via:

OpenAI, Microsoft, and Adobe have launched AI chatbots powered by large language models (LLMs) that convert text input into images. Google has released VideoPoet, an LLM that can turn text into videos. To showcase VideoPoet’s capabilities, Google Research produced a short movie composed of clips generated by the model. VideoPoet uses a pre-trained MAGVIT V2 video tokenizer and SoundStream audio tokenizer to transform images, videos, and audio clips into a sequence of discrete codes. These codes are compatible with text-based language models, allowing integration with other modalities.

Companies like OpenAI, Microsoft and Adobe have launched AI chatbots that are powered by specific types of large language models (LLMs) that turn a text input into an image. Google has also been in the fray and it has now taken a step forward by releasing an LLM, called VideoPoet, that can turn text to videos.

To showcase VideoPoet’s capabilities, Google Research has produced a short movie composed of several short clips generated by the model.

How VideoPoet model works

For example, Google explains that for the script, it asked Bard to write a series of prompts to detail a short story about a travelling raccoon. It then generated video clips for each prompt, and when the model stitched together all resulting clips, it prepared a final YouTube Short.

“VideoPoet is a simple modelling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator,” Google said.
There is a pre-trained MAGVIT V2 video tokenizer and a SoundStream audio tokenizer which transform images, video and audio clips with variable lengths into a sequence of discrete codes in a unified vocabulary.

These codes are compatible with text-based language models, facilitating an integration with other modalities, such as text. The LLM learns modalities to predict the next video or audio token in the sequence.

“A mixture of multimodal generative learning objectives are introduced into the LLM training framework, including text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylisation, and video-to-audio,” the company said, noting that the result is an AI-generated video.

In layman’s words, VideoPoet has multiple separately trained components for different tasks integrated into a single LLM.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

Google launches LLM to generate videos from text, audio input

OpenAI, Microsoft, and Adobe have launched AI chatbots powered by large language models (LLMs) that convert text input into images. Google has released VideoPoet, an LLM that can turn text into videos. To showcase VideoPoet’s capabilities, Google Research produced a short movie composed of clips generated by the model. VideoPoet uses a pre-trained MAGVIT V2 video tokenizer and SoundStream audio tokenizer to transform images, videos, and audio clips into a sequence of discrete codes. These codes are compatible with text-based language models, allowing integration with other modalities.

Companies like OpenAI, Microsoft and Adobe have launched AI chatbots that are powered by specific types of large language models (LLMs) that turn a text input into an image. Google has also been in the fray and it has now taken a step forward by releasing an LLM, called VideoPoet, that can turn text to videos.

To showcase VideoPoet’s capabilities, Google Research has produced a short movie composed of several short clips generated by the model.

How VideoPoet model works

For example, Google explains that for the script, it asked Bard to write a series of prompts to detail a short story about a travelling raccoon. It then generated video clips for each prompt, and when the model stitched together all resulting clips, it prepared a final YouTube Short.

“VideoPoet is a simple modelling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator,” Google said.
There is a pre-trained MAGVIT V2 video tokenizer and a SoundStream audio tokenizer which transform images, video and audio clips with variable lengths into a sequence of discrete codes in a unified vocabulary.

These codes are compatible with text-based language models, facilitating an integration with other modalities, such as text. The LLM learns modalities to predict the next video or audio token in the sequence.

“A mixture of multimodal generative learning objectives are introduced into the LLM training framework, including text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylisation, and video-to-audio,” the company said, noting that the result is an AI-generated video.

In layman’s words, VideoPoet has multiple separately trained components for different tasks integrated into a single LLM.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

Apple ordered to open up in-app purchases in Brazil

Brazil’s antitrust regulator Cade has ruled that Apple...

PAN 2.0: Will Your Old PAN Still Work?

The Union Cabinet has approved the ₹ 1,435 crore PAN 2.0...

Meesho Launches GenAI-Powered Voice Bot For Post-Order Queries

SUMMARY Launched about a month ago, the bot has...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!