Google launches LLM to generate videos from text, audio input

OpenAI, Microsoft, and Adobe have launched AI chatbots powered by large language models (LLMs) that convert text input into images. Google has released VideoPoet, an LLM that can turn text into videos. To showcase VideoPoet’s capabilities, Google Research produced a short movie composed of clips generated by the model. VideoPoet uses a pre-trained MAGVIT V2 video tokenizer and SoundStream audio tokenizer to transform images, videos, and audio clips into a sequence of discrete codes. These codes are compatible with text-based language models, allowing integration with other modalities.

Companies like OpenAI, Microsoft and Adobe have launched AI chatbots that are powered by specific types of large language models (LLMs) that turn a text input into an image. Google has also been in the fray and it has now taken a step forward by releasing an LLM, called VideoPoet, that can turn text to videos.

To showcase VideoPoet’s capabilities, Google Research has produced a short movie composed of several short clips generated by the model.

How VideoPoet model works

For example, Google explains that for the script, it asked Bard to write a series of prompts to detail a short story about a travelling raccoon. It then generated video clips for each prompt, and when the model stitched together all resulting clips, it prepared a final YouTube Short.

“VideoPoet is a simple modelling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator,” Google said.
There is a pre-trained MAGVIT V2 video tokenizer and a SoundStream audio tokenizer which transform images, video and audio clips with variable lengths into a sequence of discrete codes in a unified vocabulary.

These codes are compatible with text-based language models, facilitating an integration with other modalities, such as text. The LLM learns modalities to predict the next video or audio token in the sequence.

“A mixture of multimodal generative learning objectives are introduced into the LLM training framework, including text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylisation, and video-to-audio,” the company said, noting that the result is an AI-generated video.

In layman’s words, VideoPoet has multiple separately trained components for different tasks integrated into a single LLM.

Previous News

Startup Funding Hits A 7-Year Low Of $10 Bn As Investor Appetite Wanes In 2023

Next News

Watch: NASA’s trailer on all missions of 2024

Google launches LLM to generate videos from text, audio input

Disclaimer

Popular

Open4All Is Fixing a Problem Most Companies Don’t Even Know They Have

Russian Hackers Spotted Trying to Hijack WhatsApp, Signal Accounts

Will LPG Crunch Stifle Food Delivery Growth After Q3 Rebound?

HomeRun In Talks To Raise ₹100 Cr To Scale Its Quick Construction Delivery Platform

Florra Living: Combining Design, Technology, and 3D Printing in Modern Lighting

More Like this

Will LPG Crunch Stifle Food Delivery Growth After Q3 Rebound?

AIS Design Olympiad (ADO) 7.0 Crowns National Winners

The AI Infrastructure crisis: When ambition meets ancient systems

Peak XV joins $6.3m pre-series A for India’s Newtrace

Indian digital payment firm Easebuzz seeks up to $32.5m funding

Michael Burry claims Nvidia used ‘mafia-like’ tactics to block AMD in Oracle data center deal, calls for antitrust probe

Google launches LLM to generate videos from text, audio input

Disclaimer

More like this

Will LPG Crunch Stifle Food Delivery Growth After Q3...

AIS Design Olympiad (ADO) 7.0 Crowns National Winners

The AI Infrastructure crisis: When ambition meets ancient systems

Popular

Block title

This Like-New HP Desktop With a 512GB SSD Is $130 Off

These $500 Windows Laptops Show the MacBook Neo’s Competition

Today’s NYT Connections Hints, Answers for March 9 #1002

MWC Is Where Cutting-Edge Phones Shine. Too Bad You’ll Probably Never Buy Them

72 ‘Buy It for Life’ Products: Cast-Iron, Tools, Speakers, Chairs, and More

Hatch’s Sale Has the Restore 3, Rest, and Rest+ All Discounted

Google launches Gemini 3.1 Flash-Lite, its “fastest and most cost-efficient” AI model

Startup Events

Trending News

Will LPG Crunch Stifle Food Delivery Growth After Q3 Rebound?

AIS Design Olympiad (ADO) 7.0 Crowns National Winners

The AI Infrastructure crisis: When ambition meets ancient systems

Peak XV joins $6.3m pre-series A for India’s Newtrace

Indian digital payment firm Easebuzz seeks up to $32.5m funding

About

Partnership

Contact us