2023 was the 12 months of generative AI, however extra particularly, the 12 months we witnessed the facility and potential of LLMs, massive language fashions. Lots of the world of labor relies round textual content: paperwork, electronic mail, content material, media. Each startups and huge tech firms leaned in onerous, incorporating automation instruments and generative AI functions throughout verticals.
Visible generative AI made strides as nicely. Midjourney V6, which was launched in December 2023, and and OpenAI’s Dalle-3 each supplied a step bounce in picture creation.
However the subsequent frontier is video. Progress in generative AI applied sciences for video has even be transferring very quick, but it surely’s typically much less talked about than textual content and pictures, which have already got merchandise with broad shopper adoption.
Generative AI in video consists of a number of buckets:
- Computerized video enhancing (contains descript
- Speaking avatars – textual content to video (contains firms like HourOne, Synthesia, HeyGen)
- Video footage technology (i.e. transferring footage) from immediate
This put up focuses on video footage technology.
Timeline of Generative AI for video progress in 2023
A16Z associate Justine Moore posted an wonderful X thread on the advances of generative AI for video proper earlier than the top of the 12 months.
As Justine’s timeline reveals, the massive gamers on this house are the big tech platforms: Google, Meta, Nvidia within the US and in China, Bytedance, Alibaba and Baidu. Whereas Google and Meta shared they’re engaged on AI Video technology, they’ve but to launch their merchandise to the general public.
The big tech gamers are nicely positioned to steer on this house given their entry to deep studying expertise, limitless cloud assets and deep pockets. Google Mind lately open-sourced Phenaki, a video diffusion mannequin that factors in direction of YouTube’s inside capabilities. It’s able to producing a two minute AI generated video, utilizing a collection of prompts. Meta’s Make-A-Video builds on the latest progress made in text-to-image technology know-how constructed to allow text-to-video technology. Many different paper on this house have been revealed in 2023.
On the startup entrance, up and coming gamers like PikaAI and RunwayML, provide very brief, however prime quality video creation instruments. After which, there are open supply options like Stability.ai’s Steady Video Diffusion launched in November 2023.
RunwayML is focusing on Holywood and AI filmmaking
One other device price calling out, producing movies from Photos is FinalFrame. Right here’s my video for “Panda bear browsing in Hawaii”
AI that makes everyone dance, utilizing a pictur
Justine Moore tracked 21 merchandise publicly accessible that allow customers to generate AI video footage (you’ll be able to examine them out on this Google doc created by Justine). Word that almost all of instruments generate very brief movies (as much as 16 seconds).
With enough knowledge and compute, photorealistic, interactive video technology appears inside attain. As an investor in generative AI/ interactive leisure, that is an extremely thrilling time for the Generative AI video discipline as these fashions start crossing the edge of usefulness. Nonetheless, vital challenges stay round bias, misinformation, and mental property, along with the but unknown affect of incoming regulation. Additionally, traders have a tricky query to ask: is generative AI an actual platform shift, or are we in a bubble?
Addition (Jan twenty fourth) – Google presents LUMIERE A House-Time Diffusion Mannequin for Video Era. Exhibit state-of-the-art text-to-video technology outcomes, and present that our design simply facilitates a variety of content material creation duties and video enhancing functions, together with image-to-video, video in-painting, and stylised technology.