The Current ⚡️
Posts
What We Learned About LLMs in 2024 - from Simon Wilson

What We Learned About LLMs in 2024 - from Simon Wilson

Also, OpenAI’s “deliberative argument” and Meta’s plan to populate their platform with AI influencers

Jack Lajoie
January 02, 2025 • read time ~ 9 minutes

⚡️ Headlines

🤖 AI

Alibaba's Cloud Unit Slashes AI Model Prices by Up to 85% – Alibaba's cloud division has significantly reduced the costs of its AI models, aiming to make artificial intelligence more accessible to businesses. [CNBC].

Alipay Introduces AI Image Search to Compete with WeChat – Alipay has launched an AI-powered image search feature, enhancing its 'super app' capabilities in its rivalry with Tencent's WeChat. [South China Morning Post].

AI's Impact on Employment Highlighted in New Data – Recent graphs illustrate the sectors where AI is already affecting jobs, shedding light on automation's role in the workforce. [Fast Company].

OpenAI Misses Deadline for Promised Opt-Out Tool – OpenAI has failed to deliver the opt-out tool it promised by 2025, raising concerns among users about data privacy. [TechCrunch].

AI Uncovers New Details in Centuries-Old Painting – Artificial intelligence has revealed surprising details about Raphael's 'Madonna della Rosa,' suggesting parts may have been painted by another artist. [Earth.com].

🦾 Emerging Tech

BlackRock's Bitcoin Fund Becomes Largest ETF Launch in History – BlackRock's Bitcoin fund has achieved the greatest launch in ETF history, reflecting growing institutional interest in cryptocurrency. [Bloomberg].

AI Technology Aims to Enable Communication with Animals – Researchers are developing AI tools to interpret animal communication, potentially allowing humans to 'talk' to animals. [Axios].

🤳 Social Media

Meta Plans to Integrate More AI Bots into Facebook and Instagram – Meta intends to introduce more AI-generated characters into its platforms to enhance user engagement and attract younger audiences. [New York Magazine].

Judge Blocks Parts of California's Social Media Child Protection Law – A federal judge has blocked key provisions of California's law aimed at protecting children from addictive social media features, citing potential First Amendment violations. [Courthouse News Service].

Child Influencers Face Abuse Amidst Lack of Protections – Investigations reveal instances of abuse among child influencers, highlighting the need for stronger safeguards in the industry. [The New York Times].

🔬 Research

AI 'Hallucinations' Pose Challenges in Scientific Research – The phenomenon of AI 'hallucinations,' where models generate false information, is causing concerns in scientific research. [The New York Times].

New Method Developed to Detect AI-Generated Images – Researchers have introduced a GAN-based approach to detect AI-generated images, enhancing capabilities in identifying synthetic media. [IEEE Xplore].

AI Trialed to Detect Heart Condition Before Symptoms Appear – An AI tool is being trialed to spot atrial fibrillation in patients before symptoms develop, potentially preventing strokes. [BBC News].

⚖ Legal

U.S. Sanctions Russian, Iranian Entities for Election Interference – The U.S. has imposed sanctions on Russian and Iranian groups accused of attempting to interfere in the 2024 presidential election. [NBC News].

U.S. Army Soldier Arrested for Extorting AT&T and Verizon – A U.S. Army soldier has been arrested for allegedly participating in a hacking scheme to extort AT&T and Verizon by selling stolen call records. [KrebsOnSecurity].

🎱 Random

Over 3.1 Million Fake Stars Found on GitHub Projects – An investigation has uncovered over 3.1 million fake 'stars' on GitHub projects, used to artificially boost rankings and visibility. [BleepingComputer].

Pornhub Now Blocked in Most of the U.S. South Due to Age Verification Laws – Pornhub has been blocked in several southern U.S. states following the implementation of strict age verification laws. [404 Media].

🔌 Plug-Into-This

Things we learned about LLMs in 2024

Simon Willison's year-end reflection highlights critical developments in the field of Large Language Models (LLMs) in 2024, with advancements redefining their accessibility, capability, and societal implications.

Local Accessibility: Advanced models now run on consumer-grade hardware, allowing broader experimentation and reducing dependency on cloud services.
Multimodal Innovations: LLMs have expanded into multimodal capabilities, handling inputs like text, images, and audio, unlocking new applications across industries.
Cost and Efficiency: The cost of using LLMs has plummeted due to increased competition and technical efficiencies, democratizing AI access.
Evaluation Systems: The importance of rigorous evaluation ('evals') has grown, driving meaningful improvements in model performance and reliability.
Synthetic Data Success: Effective use of synthetic data in training has boosted model capabilities, showcasing innovative methods for LLM improvement.

Here's the table of contents for my end-of-year review of things we learned out about LLMs in 2024 - we learned a LOT
— Simon Willison (@simonw)
6:09 PM • Dec 31, 2024

👀 This blog post is definitely worth a close read for its thoughtful analysis of LLM trends, offering insights into the challenges and opportunities for the underlying tech that’s really been at the core of this more recent AI revolution.

Deliberative Alignment: Reasoning Enables Safer Language Models

Back in December, OpenAI introduced "deliberative alignment," a new training paradigm that enhances language models' safety by teaching them to reason explicitly over human-written safety specifications before responding to prompts. It’s a process that has implications beyond LLMs.

Explicit Reasoning: Models are trained to “reflect” on safety guidelines using chain-of-thought reasoning, ostensibly leading to more deliberate and safer responses.
Improved Safety Compliance: This method is meant to enable models to better adhere to safety policies, primarily by creating a resilience to instances of forced compliance via malicious prompts through susceptibility to jailbreak attacks.
Enhanced Performance: Models utilizing deliberative alignment outperform previous versions across various safety benchmarks, demonstrating the effectiveness of this approach.
Data Efficiency: By directly learning safety standards in natural language, models have shown to achieve better decision boundaries with improved data efficiency.
Application in O-Series Models: OpenAI's o-series models have been aligned using this paradigm, showcasing significant advancements in generating safer and more reliable outputs.

Today we released a paper on Deliberative Alignment, a method for teaching reasoning models to explicitly think about safety in their chain of thought!
We used this method to align our o1 models, which are our most robust models to date!
openai.com/index/delibera…
— Saachi Jain (@saachi_jain_)
9:23 PM • Dec 20, 2024

⚠️ Safety has long been a major concern for anyone watching the AI boom closely…it’s an intriguing concept that teaching the model to reason its way through safety policies can make them more resistant to the type of coercion that they had been vulnerable to before. It makes sense — an individual that understands the rules beyond knowing them can make reasoned judgements in dynamic context. If actually as efficient as they claim, this should have implications for areas where physical safety of humans is of concern like self-driving cars and humanoid robots.

Meta to Invest in AI-Generated Characters and Profiles to Drive Engagement

Meta is advancing its integration of artificial intelligence by introducing AI-generated characters and profiles across its platforms, aiming to enhance user engagement and entertainment.

AI Character Integration: Meta plans to incorporate AI-generated users with bios and profile pictures into Facebook and Instagram, allowing these AI personas to generate and share content.
Enhanced User Experience: By embedding AI characters, Meta hopes to make its platforms more engaging and entertaining, particularly in targeting a younger demographic to maintain and grow its user base.
Expert Concerns: Some experts caution that the proliferation of AI-generated profiles may lead to the spread of misinformation and a decline in content quality, emphasizing the need for robust safeguards and clear labeling of AI-generated content.

What do you think of Meta adding AI-generated users to Instagram and Facebook?
These accounts have profile pictures, full bios and share content that has been generated with AI.
— CHRIS FIRST (@chrisfirsttt)
10:00 PM • Dec 30, 2024

🤖 Uh oh. Does anyone want this? Does anyone care? Is it really that different than what we have on Meta platforms already? It’s already been quite a while since the core product could be said to be focused on friends and family connecting with each other, as the algorithm tends to push content from meme accounts that you don’t know and are already probably automated to a large degree…

🆕 Updates

GitHub - OpenDriveLab/AgiBot-World: World's First Large-scale High-quality Robotic Manipulation Benchmark

World's First Large-scale High-quality Robotic Manipulation Benchmark - OpenDriveLab/AgiBot-World

github.com/OpenDriveLab/AgiBot-World

GitHub - huggingface/smolagents: 🤗 smolagents: a barebones library for agents. Agents write python code to call tools and orchestrate other agents.

🤗 smolagents: a barebones library for agents. Agents write python code to call tools and orchestrate other agents. - huggingface/smolagents

github.com/huggingface/smolagents

Holy sh*t…
Text-to-CAD AI is here
You can not only generate 3D models but also control their dimensions using just a text prompt.
it’s only the second day of 2025
— el.cine (@EHuanglu)
9:44 AM • Jan 2, 2025

Stanford just launched STORM, a groundbreaking AI research tool! 🌟
Input any topic, and it scans hundreds of websites to craft an article summarizing key findings.
Free Access: storm.genie.stanford.edu
— WorldofAI (@intheworldofai)
9:36 PM • Dec 31, 2024

📽️ Daily Demo

GPT-4o is more powerful than you think.
Here are 10 mind-blowing use cases you need to try..
(Number 6 is unreal): ⤵️
1. Handwriting recognition
— Nelly R Q (@nrqa__)
9:40 AM • Jan 2, 2025

🗣️ Discourse

A heavy-lifting robot! 🏋🏼
Imagine a machine that combines the precision of human dexterity with the strength to lift half a ton.
The Guardian GT, developed by @PalladyneAI, is a teleoperated robot with dual seven-foot arms capable of lifting up to 1,000 pounds. 🪨
Each arm… x.com/i/web/status/1…
— Lukas Ziegler (@lukas_m_ziegler)
9:52 AM • Jan 2, 2025

2025 Will Be The Year of AI Agents
Organizations will have 50 - 500 agents that automate various tasks.
These AI agents will
- Talk and perform actions on Enterprise systems
- Automate workflows
- Have an understanding of Enterprise data
- Autonomously perform tasks on… x.com/i/web/status/1…
— Bindu Reddy (@bindureddy)
9:27 PM • Jan 1, 2025

text-to-cad two days into 2025. are you ready?
— Dreaming Tulpa 🥓👑 (@dreamingtulpa)
12:39 PM • Jan 2, 2025

Former professional CAD engineer weighing in on this.
This is 100% the future of design. AI has 10-100x'd the productivity of software development but mechanical design is brutally tedious, slow, painful. @zoodotdev can do this for hardware.
In an ideal world the major CAD… x.com/i/web/status/1…
— Andrew Côté (@Andercot)
5:14 AM • Jan 2, 2025