Not to be outdone by Meta’s Make-A-Video, Google today revealed its Imagen Video project, an AI system that can generate video clips based on a text prompt (e.g. “a teddy bear washing dishes”).
While the results aren’t perfect — the system’s looping clips have artefacts and noise — Google claims Imagen Video is a step toward a system with a “high degree of controllability” and world knowledge, including the ability to generate footage in a variety of artistic styles. Tsinghua University and the Beijing Academy of Artificial Intelligence researchers released CogVideo earlier this year, which can translate text into reasonably high-fidelity short clips. However, Imagen Video appears to be a significant advancement over the previous state-of-the-art, demonstrating an aptitude for animating captions that existing systems would struggle to understand.