Meta’s Llama 4 Herd Release

Also, significant controversy has arisen over Meta’s benchmarking with Llama 4 — did they cheat?

⚡️ Headlines

🤖 AI

Kreas founders snubbed postgrad grants from the King of Spain to build their AI startup—now it's valued at $500M - Barcelona-based Kreas has quickly grown into a $500M AI powerhouse by focusing on custom enterprise copilots, skipping prestigious academic paths. [TechCrunch]

OpenAI discussed buying Jony Ive–Sam Altman AI device startup - OpenAI reportedly considered acquiring the hardware startup being developed by Sam Altman and Jony Ive, which focuses on building a novel AI device. [The Information]

AI video startup Moonvalley raised a fresh $43M, SEC filing shows - Moonvalley secures $43 million to advance its AI-powered video creation tools, following the launch of its Marey model. [TechCrunch]

From AI agent hype to practicality: Why enterprises must consider fit over flash - Experts emphasize the importance of aligning AI agents with business needs rather than succumbing to industry hype. [VentureBeat]

From MIPS to exaflops in mere decades: Compute power is exploding, and it will transform AI - The rapid advancement from millions to quintillions of operations per second is set to revolutionize artificial intelligence capabilities. [VentureBeat]

Microsoft releases AI-generated Quake II demo, but admits 'limitations' - Microsoft unveils an AI-generated level of Quake II, showcasing Copilot AI's gaming potential while acknowledging current shortcomings. [TechCrunch]

🦾 Emerging Tech

Cap Raises $11M to Fuel Stablecoin Engine as Industry Heats Up - Cap secures $11 million to develop its yield-bearing stablecoin protocol, aiming for a launch later this year amid growing market interest. [CoinDesk]

🤳 Social Media

X Announces New Requirements for Parody Accounts - X, formerly Twitter, now requires parody accounts to clearly label themselves in both username and bio to avoid impersonation. [Social Media Today]

🔬 Research

Seeing is Believing: Belief-Space Planning with Foundation Models as Uncertainty Estimators - Researchers propose a framework utilizing vision-language models to enhance robotic planning in uncertain environments. [arXiv]

⚖ Legal

Judge calls out OpenAI's “straw man” argument in New York Times copyright suit - A federal judge allowed The New York Times’ lawsuit to proceed, dismissing OpenAI’s argument that the Times’ own reporting undermined its copyright claims. [Ars Technica]

🎱 Random

The rise of 'Frankenstein' laptops in New Delhi's repair markets - Technicians in India are assembling functional laptops from salvaged parts, providing affordable options and reducing e-waste. [The Verge]

🔌 Plug-Into-This

Meta has unveiled the Llama 4 series, introducing advanced multimodal AI models designed to process diverse data types, including text, images, video, and audio. This release aims to enable developers to create more personalized and versatile AI experiences.

  • Llama 4 Scout: A compact model capable of running on a single Nvidia H100 GPU, offering a 10-million-token context window and outperforming competitors like Google's Gemma 3 and Mistral 3.1 across various benchmarks.

  • Llama 4 Maverick: A larger model comparable in performance to OpenAI’s GPT-4o and DeepSeek-V3 in coding and reasoning tasks, while using fewer active parameters.

  • Llama 4 Behemoth: Currently in training, this model boasts 288 billion active parameters and a total of 2 trillion, aiming to surpass models like GPT-4.5 and Claude Sonnet 3.7 on STEM benchmarks.

  • Mixture of Experts (MoE) Architecture: The Llama 4 series employs MoE to optimize resource utilization by activating only relevant model components for specific tasks.

  • Open-Weight Models: While marketed as open-source, the Llama 4 license imposes restrictions on commercial entities with over 700 million users, prompting discussions about the true openness of the models.

🦙 Meta's Llama 4 models are like highly skilled multitaskers, capable of understanding and processing various forms of information simultaneously, making them potentially versatile tools for developers.

Recent observations suggest that Meta may have used a customized version of its Maverick model to achieve higher benchmark scores, raising concerns about the transparency and reliability of such evaluations.

  • Customized Maverick Model: Meta reportedly employed an "experimental chat version" of Maverick for LM Arena benchmarks, differing from the publicly available version.

  • Benchmark Discrepancies: The tailored model's performance may not accurately represent the capabilities of the standard Maverick model.

  • Transparency Issues: Such practices make it challenging for developers to predict model performance in real-world applications.

  • Industry Implications: This situation highlights the need for standardized and transparent benchmarking practices across AI companies.

  • Developer Caution: Users are advised to critically assess benchmark results and consider potential customizations when evaluating AI models.

🧐 This scenario underscores the importance of transparency in AI benchmarking, as tailored models can lead to inflated perceptions of performance, potentially misleading stakeholders. Meta didn’t get far with this, if they did in fact take a deceptive path here.

Simon Willison shares his early experiences with Meta's Llama 4 models, highlighting their capabilities and current limitations in practical applications.

  • Context Window Limitations: Despite claims of a 10-million-token context window for Scout, current providers limit this to between 128,000 and 328,000 tokens.

  • Hardware Requirements: Running these models requires substantial hardware resources, making them less accessible for individual developers.

  • Performance Observations: Initial tests show promising results, but real-world performance may vary based on implementation specifics.

  • Provider Variability: Different providers offer varying levels of support and performance for Llama 4 models.

🤔 The disparity between theoretical capabilities and practical implementations of Llama 4 models highlights the challenges in translating cutting-edge AI advancements into accessible tools for broader use.

 🆕 Updates

📽️ Daily Demo

🗣️ Discourse