The Current ⚡️
Posts
Google Releases Gemini 2.5

Google Releases Gemini 2.5

Also, image generation now available directly within ChatGPT 4o

Jack Lajoie
March 26, 2025 • read time ~ 9 minutes

⚡️ Headlines

🤖 AI

Apple joins AI data center race - Apple is entering the AI data center market, aiming to compete with industry leaders. [Investor's Business Daily].

A definition of vibe coding, or how AI is turning everyone into a software developer - The concept of 'vibe coding' explores how AI enables individuals without traditional programming skills to develop software. [Medium].

Google says its new ‘reasoning’ Gemini AI models are the best ones yet - Google introduces Gemini 2.5, its latest AI model designed to enhance reasoning capabilities. [The Verge].

Developers say AI crawlers dominate traffic, forcing blocks on entire countries - AI crawlers are generating excessive web traffic, leading developers to block access from certain countries. [Ars Technica].

Earth AI’s algorithms found critical minerals in places everyone else ignored - Earth AI's technology has discovered significant mineral deposits in previously overlooked locations. [TechCrunch].

Clothing giant H&M will use models AI-made digital twins, consent included - H&M plans to utilize AI-generated digital twins of models, ensuring consent is obtained. [Inc.].

China floods the world with AI models after DeepSeek’s success - Following DeepSeek's achievements, China is rapidly releasing numerous AI models globally. [Bloomberg].

Microsoft adds ‘deep reasoning’ Copilot AI for research and data analysis - Microsoft enhances its Copilot AI with deep reasoning capabilities aimed at improving research and data analysis. [The Verge].

AI’s coming to the classroom: Brisk raises $15M after a quick start in school - Edtech startup Brisk secures $15 million to expand its AI tools for classrooms. [TechCrunch].

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI - DeepSeek-V3 achieves 20 tokens per second performance on Mac Studio, posing competition for OpenAI. [VentureBeat].

🦾 Emerging Tech

Natural Humanoid Walk Using Reinforcement Learning - Figure demonstrates a humanoid robot achieving natural walking patterns through reinforcement learning. [Figure.ai].

Fidelity Investments Prepares to Unveil Its Own Stablecoin: FT - Fidelity Investments is in advanced stages of developing a stablecoin to serve as digital cash. [CoinDesk].

🤳 Social Media

YouTube adds mobile subscriber listings, breaks for live streamers - YouTube introduces mobile subscriber listings and new features to benefit live streamers. [Social Media Today].

🔬 Research

Introducing TxGemma: Open models to improve therapeutics development - Google unveils TxGemma, a collection of open models aimed at enhancing therapeutic development. [Google Developers Blog].

⚖ Legal

Anthropic wins early round in music publishers' AI copyright case - Anthropic prevails in an initial ruling against music publishers alleging AI-related copyright infringement. [Reuters].

🎱 Random

Napster acquired by Infinite Reality for 3D virtual concerts - Infinite Reality acquires Napster to expand into 3D virtual concert experiences. [Variety].

DJs will soon be able to create mixes using the Apple Music catalog - Apple Music integrates with DJ software, allowing DJs to craft mixes using its extensive catalog. [9to5Mac].

GameStop says it will add Bitcoin as a treasury reserve asset - GameStop announces plans to include Bitcoin in its treasury reserves. [CNBC].

🔌 Plug-Into-This

Gemini 2.5: Our most intelligent AI model

Google has introduced Gemini 2.5, its most advanced AI model yet, emphasizing deeper reasoning and complex problem-solving. The model, released as Gemini 2.5 Pro Experimental, outperforms previous iterations and competitors across key AI benchmarks and is now accessible to developers and users via Google AI Studio and the Gemini app.

Gemini 2.5 is designed as a “thinking model,” focusing on step-by-step processing to enhance accuracy in tasks that require multi-step logic and reasoning.
It outperforms leading models such as GPT-4.5 and Claude 3.5 Sonnet in areas like reasoning, coding, and STEM benchmarks, signaling a leap in foundational AI performance.
The model is already available to Gemini Advanced users, with integration into Vertex AI in progress, offering scalable enterprise-level deployment.
It supports a 1 million token context window, with 2 million tokens coming soon, allowing it to work across extensive documents, datasets, or multi-turn dialogues without loss of coherence.
A practical showcase demonstrated Gemini 2.5 building a playable video game from a single prompt, illustrating its applied reasoning and generative power in real-world tasks.

1/ Gemini 2.5 is here, and it’s our most intelligent AI model ever.
Our first 2.5 model, Gemini 2.5 Pro Experimental is a state-of-the-art thinking model, leading in a wide range of benchmarks – with impressive improvements in enhanced reasoning and coding and now #1 on
— Sundar Pichai (@sundarpichai)
5:01 PM • Mar 25, 2025

🧠 Gemini 2.5 marks a shift from generative AI to cognitive AI—models not just producing content but reasoning through problems. This aligns with industry trends pushing for agents that can plan, execute, and reflect, setting the stage for more autonomous, intelligent systems in tools, workflows, and research.

Introducing 4o Image Generation

OpenAI has integrated its most advanced image generation capabilities directly into GPT-4o, creating a natively multimodal system that excels at producing precise, contextually grounded, and photorealistic images. This update emphasizes utility—such as rendering legible text, following detailed prompts, and visually communicating complex ideas—rather than generating only artistic or abstract visuals.

GPT-4o's image generation is tightly integrated with its language and reasoning abilities, allowing the model to understand, contextualize, and visually represent prompts with high fidelity.
The model is trained on the joint distribution of text and image data, enabling it to render detailed diagrams, infographics, and symbolic imagery that align precisely with user intent.
A standout feature is text rendering within images—useful for signage, labels, and visual storytelling—which previous models struggled to produce accurately.
GPT-4o can leverage uploaded or in-chat images as visual prompts, making it suitable for tasks like modifying diagrams, illustrating ideas, or transforming existing visuals based on user input.
Safety measures are in place to restrict image generation of realistic humans, minimizing misuse while allowing for creativity and communication through expressive but controlled visuals.

Today's @OpenAI release for images in ChatGPT means
1. It's easy to photoshop a picture in chat
2. Take the before / after, use AI video, & you've got a new way to do animation
The success of these tools will come down to giving artists more control
Artist control is the key🔑
— TheHeroShep (@TheHeroShep)
6:31 PM • Mar 25, 2025

🖼️ By grounding image generation in practical communication tasks—like diagrams, signage, and educational visuals—OpenAI positions GPT-4o not just as a creative tool but as a productivity layer for knowledge work, signaling a shift toward AI systems that bridge expression, explanation, and execution.

Qwen2.5-VL-32B: Smarter and Lighter

In a new blog post, Simon Willison explores the release of Qwen2.5-VL-32B, a 32B vision-language model from Alibaba’s Qwen team. Willison highlights its balance of capability and efficiency, running locally on his 64GB Mac while demonstrating strong performance on image understanding tasks, including a detailed map analysis that impressed him with its accuracy and nuance.

Willison notes the model's practical value as a “sweet spot” size—powerful enough to approach GPT-4-level reasoning while remaining resource-efficient for local deployment.
The Qwen team claims improvements over its predecessors in mathematical reasoning, visual logic deduction, and alignment with human preferences, backed by selective benchmark comparisons that outperform models like Gemma 3-27B and GPT-4o-0513.
Willison tested the 4-bit quantized version using MLX, praising both its accessibility and the quality of the image description it produced from a detailed coastal map.
The example output demonstrated geographic and semantic precision, identifying protected marine areas, topographic features, and even depth contours with structured clarity.
Community members, like Prince Canuma, quickly released various quantized formats (4-bit to bf16), enabling rapid experimentation and lowering the barrier for local usage of large multimodal models.

The second big open source (this time Apache 2.0) model release from a Chinese AI lab today is Qwen's Qwen2.5-VL-32B, which appears to be a truly fantastic multi-modal vision model based on my first attempt at running it locally (using MLX-VLM) simonwillison.net/2025/Mar/24/qw…
— Simon Willison (@simonw)
10:54 PM • Mar 24, 2025

🧭 Willison’s deep dive underscores a trend toward practical open-weight vision-language models that prioritize interpretability and deployability, hinting at a future where sophisticated multimodal understanding isn’t just cloud-bound but democratized across personal hardware and open ecosystems.

🆕 Updates

We are introducing answer modes in Perplexity to make the core search product even better for verticals: travel, shopping, places, images, videos, jobs. The next step is to get super precise that you don't have to press on these tabs. Available on web for now. Mobile soon.
— Aravind Srinivas (@AravSrinivas)
4:28 PM • Mar 25, 2025

Circle to Search. Coming shortly to all Android Perplexity users
— Aravind Srinivas (@AravSrinivas)
12:21 AM • Mar 26, 2025

Magic Doodles with #Ray2 Img-to-Vid.
Turn your imagination into animation. Just drop your doodles into #DreamMachine and watch your drawings come to life—scribbles become stories, and characters start to move. What will you bring to life?
— Luma AI (@LumaLabsAI)
3:39 PM • Mar 25, 2025

📽️ Daily Demo

Gemini 2.5 pro vs DeepSeek-V3-0324
Prompt : make a fully working chess game in html in one file
Gemini 2.5 pro generated 570 lines of code.
Deepseek V3 (guess how much).. a whooping 2372 lines of code 🔥
— Hamed (@beinghamed)
2:54 AM • Mar 26, 2025

🗣️ Discourse

Gemini 2.5 Pro really is impressive...
"Can you create a simple 3d car simulator with Three.js in a single HTML? Please add clouds, mountains, a road, some trees and a train going around. Make sure it works on mobile."
— mrdoob (@mrdoob)
9:34 AM • Mar 26, 2025

What a day: OpenAI revelas native image gen!
After Gemini 2.5 pro thinkin was released, OpenAI followed suit and released its own native image generation model in GPT-4o.
At this point, it has to be said: hats off to Google, who were faster this time and released their native
— Chubby♨️ (@kimmonismus)
6:29 PM • Mar 25, 2025

Wow.
H&M is making AI clones of 30 models this year for ads and social posts.
The wild part? The models own their digital twins—and can rent them out to other brands, even H&M’s rivals.
— PJ Ace (@PJaccetturo)
4:09 PM • Mar 25, 2025