Meta is offering an AI video generation service via Twitter right now called Make-A-Video. Although it looks pretty horrendous right now, the number of comments in just a day suggests that soon the AI image generation fad will be superseded by AI video generation. It's a big leap, with researchers pushing the boundaries of generative art as we know it, in particular how much data is necessary to bring images to life.
"With just a few words, this state-of-the-art AI system generates high-quality videos from text prompts," Meta AI writes in the tweet, and calls for prompts. The trick to keeping heaps of unregulated gore and porn from being generated and posted on Twitter? Send the prompt to them, and they might post the results.
We’re pleased to introduce Make-A-Video, our latest in #GenerativeAI research! With just a few words, this state-of-the-art AI system generates high-quality videos from text prompts.Have an idea you want to see? Reply w/ your prompt using #MetaAI and we’ll share more results. pic.twitter.com/q8zjiwLBjbSeptember 29, 2022
The alternative to waiting for the (likely scarred for life) Meta AI team to potentially select your prompt out of the thousands now piling into the comments is to head over to the Make-A-Video studio (opens in new tab) and sign up using the Google form to register your interest (opens in new tab) in the tool.
The accompanying research paper (PDF warning (opens in new tab)) calls the Make-A-Video process "an effective method that extends a diffusion-based T2I model to T2V through a spatiotemporally factorized diffusion model." That's a fancy way of saying they used an evolved version of diffusion's Text-to-Image generation model to make pictures move.
"While there is remarkable progress in T2I generation," the paper reads, "the progress of T2V generation lags behind largely due to two main reasons: the lack of large-scale datasets with high-quality text-video pairs, and the complexity of modelling higher-dimensional video data."
Essentially, the size and accuracy of the datasets needed to train current text to video AI models are just too vast to be viable.
The amazing thing about this evolution is that "it does not require paired text-video data," the paper notes. That's unlike many video and image generators out there that rely on galleries of content already paired with text. "This is a significant advantage compared to prior work," it explains, as it isn't as restricted and doesn't require as much data in order to work.
There are a few ways to use the tool, with it either filling in the motion between two images, simply adding motion to a single image, or creating new variations of a video based on the original. The results are fascinating. They're dreamy and psychedelic, and can be generated in a few different styles.
Sure these are a little spooky, especially when you remember that the results are only going to get more realistic, but a little hike through uncanny valley never hurts on the lead up to Halloween.