The Current ⚡️
Posts
Kling’s Custom Face 🎭 Model & The Future of AI Video 🎥

Kling’s Custom Face 🎭 Model & The Future of AI Video 🎥

Weekly recap and deep dive into the most compelling story lines.

Jack Lajoie
November 10, 2024 • read time ~ 12 minutes

Happy Sunday!

Let’s go deeper on the most compelling discussion topics raised during the past week:

Kling.AI released a Custom Face feature that got us thinking about the future of AI video. Of all the AI hype trains, the one attached to video might genuinely be the longest and heaviest. Since the original SORA annoucement, there has been a seemingly endless cycle of “this is the best thing ever” to “it’s the end of good movies”. Hollywood has weighed in, and it’s generally not too happy about the implications, but it might not matter a whole lot if video generating tools continue to offer the level of production grade features needed for creators to make compelling storylines come to life.

But first, here are the stories you definitely don’t want to have missed this week:

ICYMI: Top Headlines from the Week

✏️ Playing Pictionary with AI: A New Kind of Benchmark?
📺 Amazon's New AI-Generated Recaps for Prime Video
🚀 With Trump in the White House, AI Companies Could Run Wild
🔍 Perplexity Dove Into Real-Time Election Tracking While Other AI Companies Held Back
📈 How The Markets Responded to Trump Victory
👀 The Battle Over Open-Source AI Will Be Settled in Trump’s Presidency
🌐 $15 million for chat.com, bought by OpenAI
🤖 Magentic One: A Generalist Multi-Agent System for Solving Complex Tasks
📜 OpenAI Defeats News Outlets' Copyright Lawsuit Over AI Training
🎭 ByteDance's AI Portrait Animator

Diving deeper

Kling’s Custom Face Model & The Future of AI Video

There have been a few announcements this week from major AI video competitors showing a top point of focus coalescing around character consistency.

finally, AI trainings arrived to Krea.
train an AI model and teach it about your own characters, styles, products, and more.
sharing 100 early invites below 👇
— KREA AI (@krea_ai)
1:11 PM • Nov 7, 2024

This is the most impressive tool I’ve tried in my 2 years working with generative AI.
@Kling_ai gave me early access to their Custom Face Video Model, and yes, this is TEXT TO VIDEO.
How does it work? I’ll break it all down in this thread (includes tutorial + prompts)🧵👇 x.com/i/web/status/1…
— TechHalla (@techhalla)
6:59 AM • Nov 4, 2024

And it makes sense—if AI video seeks large scale adoption beyond niche content creators, they’ll need to provide tools that match the needs (and outputs) of large scale production teams in Hollywood.

Since the 2000s, video has basically taken over the internet (at least in advertising based channels) due to the level at which it engages viewers. It turns out people are more interested to watch pictures that move vs. viewing static images vs. reading text. Big surprise!

But there’s a deeper insight to consider when projecting the future of AI video. What type of videos perform the best?

As of November 10, 2024, the top-grossing movies worldwide are:

Avatar (2009) – $2.92 billion
Avengers: Endgame (2019) – $2.80 billion
Avatar: The Way of Water (2022) – $2.32 billion
Titanic (1997) – $2.26 billion
Star Wars: Episode VII - The Force Awakens (2015) – $2.07 billion
Avengers: Infinity War (2018) – $2.05 billion
Spider-Man: No Way Home (2021) – $1.92 billion
Inside Out 2 (2024) – $1.69 billion
Jurassic World (2015) – $1.67 billion
The Lion King (2019) – $1.66 billion

Now, what’s common to all of those movies? A compelling storyline depicted by relatable characters. If you’ve seen any of these, chances are that the moment you see the title you see an image of the main character in your brain.

Look at all these movie posters. It’s not the intricate sets or clever plot twists that we see — it’s the characters that take up the majority of the poster. That’s what draws people in. We are social beings, and the somewhat voyeuristic experience of watching as other people act out scenes is deeply enthralling to us.

When we watch movies and observe other people's emotions, our brains undergo a fascinating process that involves several regions working together to interpret and even mirror those emotions. Here’s a breakdown of what can happen in the brain while watching movies:

Mirror Neurons Activation: Mirror neurons, located in parts of the brain associated with motor and sensory functions, activate when we see someone else’s actions or emotions. These neurons "mirror" the observed emotions, helping us to feel what others might be feeling, allowing us to empathize and connect with the characters.
Amygdala Response: The amygdala, known for processing emotions like fear and pleasure, is activated when we observe emotional scenes. It plays a role in the emotional intensity of our response, so when characters experience happiness, fear, or sadness, we often feel these emotions strongly as well.
Empathy Networks: The brain's "empathy network," which includes the anterior insula and the anterior cingulate cortex, also plays a crucial role. These areas help us understand and even predict the emotions and intentions of others, making it easier to relate to what a character is experiencing.
Oxytocin Release: Known as the "bonding hormone," oxytocin can be released during moments when we feel connected to the story. It heightens our sense of empathy and makes us feel closer to the characters, even though they are fictional.
Dopamine and Reward Systems: The brain’s reward system, which involves dopamine, is activated during emotionally satisfying scenes, like the resolution of a storyline or when justice is served. This reward system makes watching movies enjoyable and even addictive because it provides a pleasurable response.
Prefrontal Cortex Activation: The prefrontal cortex helps us process and reflect on the emotions we observe. It’s involved in more complex emotional responses, allowing us to appreciate a character’s development, understand plot nuances, and even anticipate what might happen next.

Together, these processes enable us to connect with characters, empathize with their experiences, and feel a range of emotions as though they were our own, making movies an immersive and impactful experience. The power of that experience (and whether or not we are ultimately willing to pay money to have it) hinges heavily on whether or not the movie is able to elicit that emotional response.

Consider a movie format that doesn’t actually rely on human characters (animation is of course an exception, and is a growing category, actually — but generally animated characters are given strong human-like characteristics, such as voice, mannerisms, and inflection, even if the animated character depicts an animal like in the Lion King). The highest-grossing nature documentary is "March of the Penguins" (2005), which earned approximately $127 million worldwide.

$127 million is good for about 7% of the tenth highest grossing film (The Lion King). If you saw the film, you know how good it was at teasing out relatable storylines from the footage, drawing viewers into challenging experiences that the penguins went though such as loss of offspring or mate. The skilled narrator helps the viewer invest into what is otherwise just footage of penguins walking around. But without those compelling human characters, the ceiling on earnings was always going to be lower than the biggest blockbusters.

So it’s clear that AI video tools are going to have to push in this direction to compete for enterprise level markets.

The really interesting thing though, is not that AI video tools make those things possible (they already were, really) — it’s that they have potential to make those things widely accessible.

In the old paradigm of making movies (let’s say the 80s or 90s), to make a blockbuster that’s seen by millions of people, you need a huge budget (and a lot of time, too) to afford several things:

A-list talent that people want to see, like Tom Cruise
A screen writer
Production grade cameras to film the talent
10 year trained professional operators of those complicated cameras
A full crew of union-paid assistants to set up scenes, adjust lighting, etc
A team of coordinators that arrange schedules, scout and book locations, and manage personnel
A team of editors to arrange the footage into a cohesive storyline
A marketing team to promote the film and make sure people hear about it

That image of movie making is truly already being slimmed down and changed intrinsically by modern technology. Most movies are shot in studios with actors running around in front of green screens, and skilled technicians can easily place the main character’s face over a body double for certain scenes. But it’s still quite an expensive process. The skillset required to perform those edits is rigorous and highly technical. The software to execute the ideas is complex and expensive. The paradigm shifted to something like:

A screen writer
Production grade cameras
Visual effects & CGI specialists
A marketing team

Now, over the last year, it’s been possible for clever creators to design workflows using custom AI tools that produce compelling video content without any of that overhead. YouTube is littered with videos like this claiming to explain how to make $10k a month with just ad revenue. It’s obviously not nearly as easy as they are making it out to be, but suddenly the paradigm for creating videos that are viewed by many many people is starting to looks like:

A savvy user
Some AI tools
A Wi-Fi connection
A YouTube account

That’s much, much different than the place we started. Of course, this level of disruption implies quite a bit of quality reduction in many senses, and perhaps AI videos will generally be rejected by the populace. Maybe the tools will never quite arrive at the level of ease-of-use that they are usually hyped around, or it will be more expensive than the $10-100 spread we see across the space currently, pricing out the majority of people.

But it’s hard to ignore the implications of such a shift—the leveling of creative capacities across all of humanity so that the distance between IDEA and PRODUCT is borderline INSTANT—that’s something we can’t expect to just ignore.

If you have time, watch this breakdown of Kling.AI’s new Custom Face feature and decide for yourself if that concept is within reach sooner or later 🤔

Any thoughts?