May 25, 2023
Portfolio
Unusual

Create what you dream: The emerging Generative AI video and 3D landscape

Sandhya Hegde
No items found.
Create what you dream: The emerging Generative AI video and 3D landscapeCreate what you dream: The emerging Generative AI video and 3D landscape
All posts
Editor's note: 

From entertainment and advertising to gaming and AR, video has become the most engaging content format of our times. Businesses, creators, and individuals all use video to tell their stories. However, creating high-quality video and virtual worlds remains one of the most time-consuming and resource-intensive creative tasks. 

Our favorite movies and games take years and hundreds of millions of dollars to produce. While the advent of user-generated content on YouTube, Roblox, and now TikTok have increased access to the medium, it has just made us all more hungry for more. How long before anything we dream up can be brought to life with video and shared with the world? 

In this post we take a look at the latest capabilities of generative AI for media and gaming and where we think business disruption will occur in the world of video and 3D.

 

Technical advances: Generating at 24 frames per second

The foundation of generative video lies in text-to-image models. You can trace the evolution of text-to-image in three waves starting with GANs in the mid 2010s, transformers in 2021, and eventually Stable Diffusion in August 2022. 

Controlnet for diffusion models, released as recently as February 2023, took that further by making it faster and cheaper to generate photorealistic images. In parallel, progress is being made in NeRF (Neural Radiance Field) models since 2020 that could create 3D models from a small set of images without needed polygonal meshes.  

There are significant rendering challenges in computer graphics as we step from generating images to synthesizing video and 3D forms:

  • We need to encode orders of magnitude more information than images — lighting, viewpoints, textures, etc.  
  • Consistency across frames for the above as well as characters and temporality 
  • The lack of multimodal datasets that actually describe frames and the movement across them or polygonal meshes for virtual worlds  
  • The significant cost of inference for generated video 

There are hundreds of AI researchers, ML engineers, and product development teams working on addressing these barriers to transform media and gaming as we know it. 

Source: Nvidia Research, Video LDM

Nvidia’s Video LDM team leveraged diffusion models and turned the image generator into a video generator by introducing time and fine-tuning on videos. They then upsample the output to create high resolution videos from text.

Picsart’s Text2Video-Zero team incorporates text-guided editing (motion dynamics) of videos using an approach similar to Controlnet. Similarly, Runway ML’s Gen-1and2 models focus on user-guided video editing (styling, masking, rendering, etc.) without losing structure and motion. Other projects to watch include Google’s Imagen, Meta’s Imagebind and Make-a-Video. NeRFs (Neural Radiance Fields) generate immersive 3D views of a subject’s images using neural networks, as opposed to conventional 3D rendering.

Source: Twitter, Kris Kashtanova @icreatelife

Microsoft’s NUWA-XL model uses a “diffusion over diffusion” approach to train on long videos — 3,376 frames/140 seconds. A global diffusion model generates the keyframes across the entire time range. Local models recursively fill content in-between. 

 

Source: Nvidia Research, Instant NeRFs

How generative video will disrupt creativity

The impact of these open-source and platform advances in video and 3D generation will be manifold in our world. From user-generated videos to Hollywood, gaming and robotics, the next few years will be full of new possibilities that none of us could have imagined as little as five years ago. 

However, unlike in writing, law, accounting, and other similar areas, we strongly believe that generative AI will create more jobs (rather than reduce them) in the media and gaming industries by lowering production costs and decreasing risk.

Imagine a world where hundreds of millions of people can produce videos and artificial worlds rather than a few. Imagine there will be not 10,000 but 1 million new movies and games released every year catering to a more diverse set of human needs and representation than ever before. New markets, social networks, and artistic styles will be born from the revolution in generative video.  

1. The rise of indie gaming and film-making
Platforms born on open-source generative video have the promise of breaking the oligopoly in massive gaming platforms like Unity, Unreal, as well as mainstream media production studios. The advent of streaming has already created a massive demand for fast-moving niche content that is still challenging to produce today. However modern studios like A24 Films, which produced Everything Everywhere All at Once for $25M, give us a peek into what the future of entertainment could look like with micro studios and independent producers. We believe the trend of democratization will continue further to empower independent storytellers. 

2. 100X CGI/VFX
The beauty of generative AI is that it’s simultaneously a sustaining innovation for incumbents and a disruptive innovation for startup challengers. For established studios, the opportunity is to tell better stories with more creative freedom and lower costs of production, hopefully allowing better profit-sharing models to emerge between studios and creatives. Perhaps Avatar: The Way of Water shouldn’t cost $250M to produce. An exciting new development is the ability to apply styles to existing video, giving creatives the flexibility to make live-action videos seem like animations, easily swap out characters, or reskin certain parts of a video using only text.    

3. Real-time interactive media
As generation gets faster and cheaper, multimodal models can make personalization possible in real-time interactive media. Whether it’s a game, movie, or an advertisement, we might be able to not just participate but interact and direct media in real time. In the next five years, we might even see the rise of virtual game characters and movie actors driven by sophisticated trained models that can be hired for movies. 

Taking these predictions into account, we’ve created this market map of companies across generative video and 3D and the primary categories of businesses that we believe will be most impacted by AI. The market map is segmented based on three categories of business opportunities: consumer, creator, and professional. For the sake of this post, we primarily focus on the professional market segment that represents the most opportunity for enterprise software startups. 

Market Map: Generative AI video and 3D forms

We see four major professional market segments with the potential to spawn dozens of generational businesses each — business content, advertising, media, and gaming. The current state and maturity of products across all these companies are still in the very early days. What’s possible today is just a starting point and a hint at what’s possible in the future.  

Business content straight from Hollywood to LinkedIn

 

A vast majority of business content today is in the form of text — a barrier for the constant education and evangelism companies need to do to help customers get the most out of their products and train new employees to be successful. By making it easier for businesses to create content and tell stories in multimedia formats, knowledge work can become more engaging. Startups like Synthesia and Elai help over 50k business teams around the world to create professional and engaging videos from text by offering human-like avatar presenters that use the latest in facial tracking algorithms. 

Personalized advertising that reflects you and your taste

Similar to the surge of personalized text for email marketing, we anticipate a boon for performance marketers and digital advertisers in the making — the ability to serve ads that customers don’t hate! While we’re still in the early days of video personalization, startups like Tavus and Windsor are already helping businesses create personalized video content that engages customers in a totally new way.

Media and entertainment built in your garage last weekend

Building big studio content on an indie budget is going to be a massive new market, and startups like Wonder Dynamics and Monsters Aliens Robots Zombies have been working on making it a reality. Several startups are making traditionally cost-prohibitive techniques accessible to a new cohort of creators, including Rokoko and Plask for motion capture and Puppetry and Drip.art for animation. There are exciting opportunities here across the entire workflow of storyboarding, production, special effects, editing, and marketing — both for photorealistic and animated media. 

Gaming and the metaverse, no headset, no cap

The early focus on startups in 3D gaming has been in asset generation, from 3D models and scenes by Kaedim and Luma to textures from Poly. Others like Anything World and Latitude are taking a vertically integrated approach for creating immersive artificial worlds. 

Companies spanning multiple use cases: motion, speech, and video editing

This is the category where we see the most blended use cases, with startups seamlessly being absorbed into existing creative workflows. Runway has been an early leader in AI video editing, starting with inpainting and frame interpolation but expanding into even more capabilities with their Gen-2 AI system, while startups like Deepmotion have focused on reconstructing realistic movement for AR. Rime AI is a promising new entrant in speech synthesis with hundreds of diverse voices and zero latency — making real-time interactive use cases possible across gaming and media. 

Copilot for creativity

Generative AI in video has the potential to open up ideation, production, and editing to many more creatives, streamline professional workflows, and allow for faster, higher-fidelity content.

If you are a founder or a potential founder interested in building an AI company in video and 3D animation, please reach out to sandhya@unusual.vc and alisa@unusual.vc right away. We want to hear from you!

All posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

All posts
May 25, 2023
Portfolio
Unusual

Create what you dream: The emerging Generative AI video and 3D landscape

Sandhya Hegde
No items found.
Create what you dream: The emerging Generative AI video and 3D landscapeCreate what you dream: The emerging Generative AI video and 3D landscape
Editor's note: 

From entertainment and advertising to gaming and AR, video has become the most engaging content format of our times. Businesses, creators, and individuals all use video to tell their stories. However, creating high-quality video and virtual worlds remains one of the most time-consuming and resource-intensive creative tasks. 

Our favorite movies and games take years and hundreds of millions of dollars to produce. While the advent of user-generated content on YouTube, Roblox, and now TikTok have increased access to the medium, it has just made us all more hungry for more. How long before anything we dream up can be brought to life with video and shared with the world? 

In this post we take a look at the latest capabilities of generative AI for media and gaming and where we think business disruption will occur in the world of video and 3D.

 

Technical advances: Generating at 24 frames per second

The foundation of generative video lies in text-to-image models. You can trace the evolution of text-to-image in three waves starting with GANs in the mid 2010s, transformers in 2021, and eventually Stable Diffusion in August 2022. 

Controlnet for diffusion models, released as recently as February 2023, took that further by making it faster and cheaper to generate photorealistic images. In parallel, progress is being made in NeRF (Neural Radiance Field) models since 2020 that could create 3D models from a small set of images without needed polygonal meshes.  

There are significant rendering challenges in computer graphics as we step from generating images to synthesizing video and 3D forms:

  • We need to encode orders of magnitude more information than images — lighting, viewpoints, textures, etc.  
  • Consistency across frames for the above as well as characters and temporality 
  • The lack of multimodal datasets that actually describe frames and the movement across them or polygonal meshes for virtual worlds  
  • The significant cost of inference for generated video 

There are hundreds of AI researchers, ML engineers, and product development teams working on addressing these barriers to transform media and gaming as we know it. 

Source: Nvidia Research, Video LDM

Nvidia’s Video LDM team leveraged diffusion models and turned the image generator into a video generator by introducing time and fine-tuning on videos. They then upsample the output to create high resolution videos from text.

Picsart’s Text2Video-Zero team incorporates text-guided editing (motion dynamics) of videos using an approach similar to Controlnet. Similarly, Runway ML’s Gen-1and2 models focus on user-guided video editing (styling, masking, rendering, etc.) without losing structure and motion. Other projects to watch include Google’s Imagen, Meta’s Imagebind and Make-a-Video. NeRFs (Neural Radiance Fields) generate immersive 3D views of a subject’s images using neural networks, as opposed to conventional 3D rendering.

Source: Twitter, Kris Kashtanova @icreatelife

Microsoft’s NUWA-XL model uses a “diffusion over diffusion” approach to train on long videos — 3,376 frames/140 seconds. A global diffusion model generates the keyframes across the entire time range. Local models recursively fill content in-between. 

 

Source: Nvidia Research, Instant NeRFs

How generative video will disrupt creativity

The impact of these open-source and platform advances in video and 3D generation will be manifold in our world. From user-generated videos to Hollywood, gaming and robotics, the next few years will be full of new possibilities that none of us could have imagined as little as five years ago. 

However, unlike in writing, law, accounting, and other similar areas, we strongly believe that generative AI will create more jobs (rather than reduce them) in the media and gaming industries by lowering production costs and decreasing risk.

Imagine a world where hundreds of millions of people can produce videos and artificial worlds rather than a few. Imagine there will be not 10,000 but 1 million new movies and games released every year catering to a more diverse set of human needs and representation than ever before. New markets, social networks, and artistic styles will be born from the revolution in generative video.  

1. The rise of indie gaming and film-making
Platforms born on open-source generative video have the promise of breaking the oligopoly in massive gaming platforms like Unity, Unreal, as well as mainstream media production studios. The advent of streaming has already created a massive demand for fast-moving niche content that is still challenging to produce today. However modern studios like A24 Films, which produced Everything Everywhere All at Once for $25M, give us a peek into what the future of entertainment could look like with micro studios and independent producers. We believe the trend of democratization will continue further to empower independent storytellers. 

2. 100X CGI/VFX
The beauty of generative AI is that it’s simultaneously a sustaining innovation for incumbents and a disruptive innovation for startup challengers. For established studios, the opportunity is to tell better stories with more creative freedom and lower costs of production, hopefully allowing better profit-sharing models to emerge between studios and creatives. Perhaps Avatar: The Way of Water shouldn’t cost $250M to produce. An exciting new development is the ability to apply styles to existing video, giving creatives the flexibility to make live-action videos seem like animations, easily swap out characters, or reskin certain parts of a video using only text.    

3. Real-time interactive media
As generation gets faster and cheaper, multimodal models can make personalization possible in real-time interactive media. Whether it’s a game, movie, or an advertisement, we might be able to not just participate but interact and direct media in real time. In the next five years, we might even see the rise of virtual game characters and movie actors driven by sophisticated trained models that can be hired for movies. 

Taking these predictions into account, we’ve created this market map of companies across generative video and 3D and the primary categories of businesses that we believe will be most impacted by AI. The market map is segmented based on three categories of business opportunities: consumer, creator, and professional. For the sake of this post, we primarily focus on the professional market segment that represents the most opportunity for enterprise software startups. 

Market Map: Generative AI video and 3D forms

We see four major professional market segments with the potential to spawn dozens of generational businesses each — business content, advertising, media, and gaming. The current state and maturity of products across all these companies are still in the very early days. What’s possible today is just a starting point and a hint at what’s possible in the future.  

Business content straight from Hollywood to LinkedIn

 

A vast majority of business content today is in the form of text — a barrier for the constant education and evangelism companies need to do to help customers get the most out of their products and train new employees to be successful. By making it easier for businesses to create content and tell stories in multimedia formats, knowledge work can become more engaging. Startups like Synthesia and Elai help over 50k business teams around the world to create professional and engaging videos from text by offering human-like avatar presenters that use the latest in facial tracking algorithms. 

Personalized advertising that reflects you and your taste

Similar to the surge of personalized text for email marketing, we anticipate a boon for performance marketers and digital advertisers in the making — the ability to serve ads that customers don’t hate! While we’re still in the early days of video personalization, startups like Tavus and Windsor are already helping businesses create personalized video content that engages customers in a totally new way.

Media and entertainment built in your garage last weekend

Building big studio content on an indie budget is going to be a massive new market, and startups like Wonder Dynamics and Monsters Aliens Robots Zombies have been working on making it a reality. Several startups are making traditionally cost-prohibitive techniques accessible to a new cohort of creators, including Rokoko and Plask for motion capture and Puppetry and Drip.art for animation. There are exciting opportunities here across the entire workflow of storyboarding, production, special effects, editing, and marketing — both for photorealistic and animated media. 

Gaming and the metaverse, no headset, no cap

The early focus on startups in 3D gaming has been in asset generation, from 3D models and scenes by Kaedim and Luma to textures from Poly. Others like Anything World and Latitude are taking a vertically integrated approach for creating immersive artificial worlds. 

Companies spanning multiple use cases: motion, speech, and video editing

This is the category where we see the most blended use cases, with startups seamlessly being absorbed into existing creative workflows. Runway has been an early leader in AI video editing, starting with inpainting and frame interpolation but expanding into even more capabilities with their Gen-2 AI system, while startups like Deepmotion have focused on reconstructing realistic movement for AR. Rime AI is a promising new entrant in speech synthesis with hundreds of diverse voices and zero latency — making real-time interactive use cases possible across gaming and media. 

Copilot for creativity

Generative AI in video has the potential to open up ideation, production, and editing to many more creatives, streamline professional workflows, and allow for faster, higher-fidelity content.

If you are a founder or a potential founder interested in building an AI company in video and 3D animation, please reach out to sandhya@unusual.vc and alisa@unusual.vc right away. We want to hear from you!

All posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.