- The Current ⚡️
- Posts
- ✏️ New AI Benchmarks with...Pictionary?
✏️ New AI Benchmarks with...Pictionary?
Exploring how games like Pictionary are being used as innovative benchmarks for evaluating AI capabilities in problem-solving and reasoning.
The Daily Current ⚡️
Welcome to the creatives, builders, pioneers, and thought leaders ever driving further into the liminal space.
NO ELECTION NEWS HERE. If you’re brave enough to use the internet today, thought I’d make this newsletter fully free of election related news. From using Pictionary to assess AI capabilities, to Amazon's AI-generated recaps enhancing viewer experience, and Perplexity's controversial offer during a strike, there are plenty of fun things to think about that aren’t politics.
🔌 Plug Into These Headlines:
AI enthusiasts are turning to games like Pictionary and Minecraft as novel benchmarks for testing large language models’ problem-solving and reasoning abilities. These game-based tests aim to provide more intuitive and “un-gameable” evaluations compared to traditional AI benchmarks, which often rely on rote memorization or cover irrelevant topics. While some researchers see potential in these methods, others remain skeptical about their effectiveness in measuring real-world reasoning skills.

Paul Calcraft developed a Pictionary-like game for AI models to test their problem-solving skills.
Adonis Singh created Mcbench, a tool that tests AI models’ abilities in Minecraft.
These game-based benchmarks aim to be “un-gameable” and force models to think beyond their training data.
Games provide a visual and intuitive way to compare AI model performance compared to text-based benchmarks.
Some researchers are skeptical about Minecraft’s effectiveness as an AI testbed for real-world reasoning.
🙃 While intriguing from a research perspective, by turning to familiar games as novel AI benchmarks, these clever researchers are also making the complex world of artificial intelligence somewhat more accessible and relatable to the general public.
Amazon’s Prime Video is introducing AI-generated recaps for TV shows and movies. This new feature creates short summaries of previous episodes or seasons, helping viewers quickly catch up on content they may have missed or forgotten. The recaps focus on key plot points and character developments, aiming to enhance the viewing experience and improve content discovery on the platform.

AI-generated recaps will be available for select shows initially, with plans to expand to more titles.
The recaps provide concise summaries of key plot points, character developments, and important events.
This feature is meant to be useful for viewers who have taken breaks between seasons.
The recaps will be available alongside human-written summaries for some titles.
🏃♂️ Adding to a growing list of practical applications of AI within existing businesses, by using this strategy of reducing barriers to re-entry for lapsed viewers, Amazon is tackling the common issue of audience churn in an increasingly competitive streaming market.
A new study by Ziff Davis executives found that leading AI companies like OpenAI, Google, and Meta rely heavily on content from premium publishers to train their large language models (LLMs).

Analysis of an open-source replication of OpenAI’s OpenWebText dataset used to train GPT-2 showed nearly 10% of URLs came from 15 premium publishers studied.
The findings align with a 2023 News/Media Alliance study showing popular LLM datasets overweight publisher content by a factor of 5 to 100 compared to generic web content.
Ziff Davis conducted the study to educate the industry and inform its own conversations with AI firms, as it has not yet struck major AI deals like some competitors.
🤔 It’s not really a surprising headline—we knew they probably relied on more than they let on. But it’s important to note because the degree to which they actually do rely on premium content implies a higher amount of leverage for publishers like the New York Times as they continue to fight legal battles around this issue.
Perplexity CEO Aravind Srinivas sparked controversy by offering his AI company’s services to The New York Times during a tech workers’ strike. The offer, made publicly on social media, was widely interpreted as an attempt to replace striking workers, drawing accusations of strike-breaking. Srinivas later clarified that the intention was to provide technical infrastructure support, not to substitute workers with AI. This incident highlights the ongoing tensions between AI companies and traditional media, as well as the ethical concerns surrounding AI’s role in labor disputes.

NYT Tech Guild workers began a strike on November 4, 2024, seeking a 2.5% annual wage increase and other improvements.
Perplexity CEO Aravind Srinivas offered the company’s services to NYT during the strike via a public message on X (formerly Twitter).
The striking workers are responsible for software support and data analysis at NYT.
Srinivas’s offer was widely criticized as an attempt to act as a “scab” - a term for those who undermine strikes by replacing striking workers.
😆 Bit of a slight here, don’t you think? With Perplexity in the midst of a public spat having received a cease & desist letter from the NYT, their good and gracious CEO extends an olive branch of peace to the publisher now beleaguered by strikes. Sense of humor, or tone deaf?
The adoption of in-vehicle AI faces challenges in Western markets, as revealed by Bosch’s recent study. Despite recognition of AI’s potential role in future mobility, drivers in the US, UK, and Germany express low trust levels in the technology. The research highlights a generational divide in AI acceptance and emphasizes the importance of transparency and reliability in building consumer confidence for successful integration of AI in vehicles.

Only 28% of US drivers trust in-vehicle AI, compared to 32% in the UK and 35% in Germany, according to the 2023 Bosch study.
Privacy and data protection are major concerns for drivers regarding AI in vehicles.
65% of drivers worry about AI making incorrect decisions while driving.
72% of respondents believe AI will play a crucial role in future mobility.
👵👦🚗 The generational divide in AI trust levels indicates that the future of automotive AI adoption could largely depend on the preferences and values of younger drivers entering the market.
Other fun tidbits: