AI Agents: Next Wave 🌊 or Still Far Off 🔭?

Weekly recap and deep dive into the most compelling story lines.

Happy Sunday!

We have three fun deep dives for you to sink your teeth into this week.

OpenAI’s release of MLE-Bench got us thinking about AI agents again, so it felt like a good time to recap that topic and take stock of where we are now, given the present context and new information. The Nobel Prize winner for Physics is a prominent AI doomer, and really public concern around AI and tech has never been higher, especially heading into the election next month. Finally, Meta’s Movie Gen brings up some questions about the future of AI Video.

But first, here are the stories you definitely don’t want to have missed this week:

Diving deeper

❶

AI Agents: Next Wave 🌊 or Still Far Off 🔭?Are AI agents still the next wave of AI tech we should be watching?

AI agents have been touted as the next frontier of AI technology. While not yet fully realized, AI agents appear to be progressing rapidly and are attracting substantial investment and research.

AI agent workflows will drive massive AI progress this year—perhaps even more than the next generation of foundation models. This is an important trend, and I urge everyone who works in AI to pay attention to it.

  • AI agents are designed to act autonomously in pursuit of open-ended goals, making long-term plans and using tools.

  • Major tech companies like Microsoft, Google, and OpenAI are heavily investing in AI agent development.

  • The supposed transition from "co-pilot" to "autopilot" for many agents true functionality is expected within the next 24 months.

Insights from the MLE-Bench Release

The MLE-Bench, introduced this week by OpenAI, provides valuable insights into the current state of AI agents.

  • It evaluates AI agents' machine learning engineering capabilities across 75 Kaggle competitions.

  • The best-performing setup (OpenAI's o1-preview with AIDE scaffolding) achieved Kaggle's bronze level in 16.9% of competitions.

While showing promise, this indicates that usable agents for complex tasks (like engineering) are still in development and not yet ready for widespread deployment.

OpenAI’s Swarm

Also introduced this week was OpenAI’s Swarm, an educational framework designed to explore lightweight multi-agent orchestration in AI systems. It provides a Python-based platform for coordinating multiple AI agents, focusing on simplicity and flexibility. Swarm is built around the concepts of agents (encapsulating specific tasks) and handoffs (allowing control transfer between agents), enabling complex multi-step processes.

  • Swarm allows for direct integration of Python functions and supports streaming responses, making it suitable for real-time, complex simulations and large-scale data analysis.

  • Unlike single-agent models, Swarm enables collaboration between multiple AI agents, offering more granular control over execution steps and tool calls.

  • The framework is open-source (MIT license) but experimental, encouraging innovation in multi-agent AI systems while not being intended for production environments.

  • Swarm lacks built-in memory management, requiring developers to implement their own solutions, which could be seen as both a limitation and an opportunity for customization.

Microsoft's “Recall” Debacle

Microsoft was the first Big Tech company to go ahead and put what they call an “agent” in their consumer facing products, with the Recall feature. It created a strong initial backlash based around privacy concerns, and they actually backtracked on the way they intended to roll the feature out and delayed the release.

  • Latest status: Microsoft has implemented security updates for Recall, including opt-in requirements and enhanced data protection measures.

  • Availability: The updated Recall feature is set to become available for Windows Insider Program members in October 2024.

Business Use Cases

There hasn’t been much debate around whether AI agents are potentially useful for businesses (once you start thinking beyond single-step processes, the possibilities are endless) — but we have yet to see exactly which ones are actually feasible / preferable to have conducted with minimal human oversight.

Here are a few that we can examine:

  • Customer support: Companies like Sierra, Decagon, and Maven AGI are using AI agents to automate responses and streamline processes.

  • Regulatory compliance: Startups like Norm Ai and Greenlite AI leverage AI agents to ensure adherence to complex regulations.

  • Data analysis: AI agents like Delphina are being used to streamline data analysis processes.

  • Negotiations: Pactum's AI agents have been successful in automating supplier negotiations for large distribution companies.

Another widely touted potential use case commonly hyped around this topic is marketing — but it hasn’t yet been proven and still remains more or less divided into single-use case tools (such as copy generation or image generation).

Apple Intelligence and Consumer Demand

Apple's rollout of Apple Intelligence on October 28th will provide more insights into consumer demand for AI agent-like features:

  • Initial features include:

    • Writing Tools

    • Revamped Siri (it’s about TIME!)

    • Smart Replies

    • Notification Summaries

    • Photo Clean Up

  • Future updates will bring:

    • ChatGPT integration

    • Genmoji

    • Image Playground.

It’s still too early to call if AI agents will live up to their promise, but a cursory read of the AI investor sentiment right now should give us pause on getting too excited about the more interesting use cases. Right now, investors are leery about the sheer volume of the resources being thrown into AI development, with little to show for profits. And AI agent-based products are on the more complex, resources intensive side of the AI product world — so we should expect to see a slow down on those projects until proven demand has been established or GPUs suddenly get a lot cheaper.

If not agents, then what?

Some intriguing AI sub topics that have flown under the mainstream radar so far are:

Quantum AI

  • As quantum computing advances, its integration with AI could lead to exponential increases in processing power, potentially enabling new classes of AI applications.

  • IBM Quantum and Google Quantum AI are examples.

Edge AI

  • Edge AI involves processing AI algorithms directly on edge devices, such as IoT sensors or smartphones, rather than in the cloud. This approach offers faster processing, improved privacy, and reduced bandwidth usage.

  • https://www.ibm.com/topics/edge-ai

Research AI

Human-AI Collaboration

What do you think?

~ JL

âť·

Growing Negative Public Sentiment Against Tech 🤖👎 — A Real Concern for AI Companies?

The tech industry, once widely applauded as a bastion of life-improving innovations and human progress, is facing increasing scrutiny and backlash from various sectors of society.

Fake News & Deep Fakes: A Digital Pandemic

The proliferation of fake news and deep fakes has become a significant concern for the public, touching a raw nerve in our increasingly digital society. Even the Department of Homeland Security is writing reports on it.

Impact of Fake News

Deep Fakes: A New Frontier of Deception

  • The technology behind deep fakes is becoming increasingly sophisticated and accessible, causing many to be concerned about its potential misuse.

  • Technology associated with the creation of deepfakes has been increasingly targeted by restrictive legislative efforts.

Artists' Opposition to AI

The artistic community has been particularly vocal in its opposition to AI technologies that they perceive as threatening their livelihoods and creative integrity.

  • AI models trained on artists' work without permission or compensation — devaluing the work of human artists.

  • Copyright infringement appears rampant, and tech companies basically seem to be getting away with it.

Hollywood's Support for the SB 1047

Even the well entrenched entertainment industry rallied behind California's AI safety bill, SB 1047, shortly before Gov. Newsom vetoed it.

Notable Supporters:

  • Over 125 Hollywood actors, directors, producers, and industry leaders signed a letter urging Governor Newsom to sign the bill.

  • Signatories include Mark Hamill, J.J. Abrams, Shonda Rhimes, and SAG-AFTRA President Fran Drescher.

Growing Concerns About OpenAI

OpenAI, once seen as the darling leader in responsible AI development, is now facing increased scrutiny and criticism.

Researcher Departures

Transition to For-Profit Entity

Pressure Mounting to Turn a Profit

  • OpenAI's revenue is growing, but the company is still projected to operate at a loss until 2029 due to the substantial costs associated with developing and scaling sophisticated AI models.

  • Investor pressure is mounting along with the piles of cash they are raising, which supposedly still might not be enough.

Nobel Prize Winner's "Doomerism"

Geoffrey Hinton, a pioneer in AI research and recent Nobel Prize winner in Physics, has already been a prominent (and credible) voice warning about the potential dangers of the AI he helped to create.

Key Points:

  • Hinton's shift from AI advocate to critic has lent a lot of credibility to concerns about AI risks.

  • His warnings focus on the potential for AI to become smarter than humans and the uncertainties surrounding AI motivations.

Mothers Against Media Addiction (MAMA)

MAMA represents a grassroots movement of parents concerned about the impact of social media and technology on children's well-being.

Core Beliefs

  • Real-world experiences should remain central to childhood development.

  • Safeguards on social media are essential and overdue.

  • Media literacy is a crucial 21st-century survival skill.

Actions:

Growing Distrust

The growing public sentiment against AI is beginning to pose real challenges to current operations and future growth for AI companies.

Only 39% of U.S. adults believe current AI technologies are safe and secure. This erosion of trust is accompanied by increasing calls for government oversight and regulation, exemplified by the support for California's SB 1047, despite it’s veto.

In response to these challenges, AI companies are facing pressure to address public concerns and implement more transparent and responsible development practices.

  • 85% support a nationwide effort to make AI safe and secure

  • 52% of employed respondents worry AI will replace their jobs

  • 80% are concerned about AI being used for cyber attacks

  • 85% want industry to share AI assurance practices transparently

  • California's SB 1047 gained significant support for regulating AI development, even from Elon Musk, before being vetoed by Gov. Newsom

AI Companies Ought to Consider Addressing Public Concerns

The growing negative sentiment against AI and tech companies represents a critical moment for the industry, especially as it’s being pressed to find profits.

If companies fail to engender trust, they risk losing the end-user right when they need them most.

Ultimately we know from history that amazing user experiences and product superiority will win the day, but we’ve been living in this new era for a while now, and privacy concerns are only getting more intense with the new iterations AI has brought from big tech companies.

What do you think?

~ JL

❸

What’s The State of AI Video 🎥 Today?

AI video generation is rapidly evolving in 2024, with some new players arriving and some interesting workflow innovations that could signal better results and efficiency in the near future.

Here’s a super useful table that compares the top tools.

Where the heck is SORA?

OpenAI's Sora, their text-to-video AI model, was first announced in February 2024.

As of March 2024, OpenAI's CTO Mira Murati stated that Sora would be publicly available "this year" and that it "could be a few months"

Teased Features:

  • Capability to generate hyperrealistic scenes based on text prompts

  • Plans to incorporate audio in future iterations

  • Potential restrictions on producing images of public figures

  • Use of watermarks to distinguish AI-generated content

Sora generated significant buzz in the AI community due to the quality of its outputs. But since then, dozens of alternatives have arisen that create similar quality videos.

Meta's Arrival with Movie Gen

Meta has made a significant entrance into the AI video generation space with the recently announced Movie Gen.

Features:

  • Ability to create high-quality 1080p videos with synchronized audio

  • Instruction-based video editing and personalized content generation

  • A transformer model with 30 billion parameters

  • Training on over 100 million video-text pairs and 1 billion image-text pairs

Revisiting the Tech Behind Video Gen

GANs (Generative Adversarial Networks)

GANs use two neural networks that compete against each other:

  • A generator network creates new data instances

  • A discriminator network evaluates the authenticity of the generated instances

This adversarial process helps in producing more realistic outputs. However, GANs can be challenging to train due to issues like model collapse.

Transformer Models

Transformers are a type of neural network architecture that has revolutionized natural language processing and is now being applied to video generation. Key features include:

  • Ability to process all parts of a sequence simultaneously

  • Use of self-attention mechanisms to capture context

  • Efficient and GPU-friendly processing

Transformer-based models like BERT and GPT have shown remarkable capabilities in generating human-like text and are now being adapted for video generation[13].

VAEs (Variational Autoencoders)

Variational Autoencoders (VAEs) are advanced generative models primarily used in unsupervised machine learning. These powerful tools can create new data that closely resembles the input data. The key components of VAEs include:

  • Encoder

  • Decoder

  • Loss function

The VAE process involves two main components: an encoder and a decoder. The encoder compresses input data into a compact representation called latent variables. The decoder then attempts to reconstruct the original data from these variables, often introducing slight variations. A loss function guides this process, ensuring the reconstructed output closely resembles the input while allowing for some creative differences. This mechanism can be likened to reassembling a Lego model with minor alterations to the original design.

Bottlenecks in AI Video Generation

  1. Computational resources: Training large models requires significant computing power and can be extremely expensive.

  2. Data requirements: High-quality, diverse datasets are crucial for training effective models.

  3. Temporal consistency: Maintaining coherence and continuity in longer videos remains challenging.

  4. Ethical concerns: Issues around copyright, deepfakes, and potential misuse of the technology.

Use in Advertising and Marketing

The advertising and marketing sectors are indeed well-positioned to leverage AI-generated videos:

  • AI tools are being used to create personalized video ads and content.

  • Companies like HeyGen offer AI-powered video creation tools specifically for marketing purposes.

However, the use of AI-generated videos in real-world advertisements is still in its early stages. While they offer advantages in terms of cost and speed of production, traditional videos still carry more weight in terms of authenticity and emotional connection with audiences.

The Future for AI Video Generation?

While AI-generated videos are still best seen as niche digital art or a complementary tool for marketers and content creators, their role for professionals will undoubtedly grow in the coming years.

The technology seems more likely to end up integrated into existing video production workflows rather than completely replacing traditional methods. Adobe is already dropping some unbelievable features that essentially just make processes that used to be crazy time consuming a whole lot easier.

What do you think?

~ JL