Google Releases Gemini 2.5

Also, image generation now available directly within ChatGPT 4o

⚡️ Headlines

🤖 AI

Apple joins AI data center race - Apple is entering the AI data center market, aiming to compete with industry leaders. [Investor's Business Daily].

A definition of vibe coding, or how AI is turning everyone into a software developer - The concept of 'vibe coding' explores how AI enables individuals without traditional programming skills to develop software. [Medium].

Google says its new ‘reasoning’ Gemini AI models are the best ones yet - Google introduces Gemini 2.5, its latest AI model designed to enhance reasoning capabilities. [The Verge].

Developers say AI crawlers dominate traffic, forcing blocks on entire countries - AI crawlers are generating excessive web traffic, leading developers to block access from certain countries. [Ars Technica].

Earth AI’s algorithms found critical minerals in places everyone else ignored - Earth AI's technology has discovered significant mineral deposits in previously overlooked locations. [TechCrunch].

Clothing giant H&M will use models AI-made digital twins, consent included - H&M plans to utilize AI-generated digital twins of models, ensuring consent is obtained. [Inc.].

China floods the world with AI models after DeepSeek’s success - Following DeepSeek's achievements, China is rapidly releasing numerous AI models globally. [Bloomberg].

Microsoft adds ‘deep reasoning’ Copilot AI for research and data analysis - Microsoft enhances its Copilot AI with deep reasoning capabilities aimed at improving research and data analysis. [The Verge].

AI’s coming to the classroom: Brisk raises $15M after a quick start in school - Edtech startup Brisk secures $15 million to expand its AI tools for classrooms. [TechCrunch].

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI - DeepSeek-V3 achieves 20 tokens per second performance on Mac Studio, posing competition for OpenAI. [VentureBeat].

🦾 Emerging Tech

Natural Humanoid Walk Using Reinforcement Learning - Figure demonstrates a humanoid robot achieving natural walking patterns through reinforcement learning. [Figure.ai].

Fidelity Investments Prepares to Unveil Its Own Stablecoin: FT - Fidelity Investments is in advanced stages of developing a stablecoin to serve as digital cash. [CoinDesk].

🤳 Social Media

YouTube adds mobile subscriber listings, breaks for live streamers - YouTube introduces mobile subscriber listings and new features to benefit live streamers. [Social Media Today].

🔬 Research

Introducing TxGemma: Open models to improve therapeutics development - Google unveils TxGemma, a collection of open models aimed at enhancing therapeutic development. [Google Developers Blog].

⚖ Legal

Anthropic wins early round in music publishers' AI copyright case - Anthropic prevails in an initial ruling against music publishers alleging AI-related copyright infringement. [Reuters].

🎱 Random

Napster acquired by Infinite Reality for 3D virtual concerts - Infinite Reality acquires Napster to expand into 3D virtual concert experiences. [Variety].

DJs will soon be able to create mixes using the Apple Music catalog - Apple Music integrates with DJ software, allowing DJs to craft mixes using its extensive catalog. [9to5Mac].

GameStop says it will add Bitcoin as a treasury reserve asset - GameStop announces plans to include Bitcoin in its treasury reserves. [CNBC].

🔌 Plug-Into-This

Google has introduced Gemini 2.5, its most advanced AI model yet, emphasizing deeper reasoning and complex problem-solving. The model, released as Gemini 2.5 Pro Experimental, outperforms previous iterations and competitors across key AI benchmarks and is now accessible to developers and users via Google AI Studio and the Gemini app.

  • Gemini 2.5 is designed as a “thinking model,” focusing on step-by-step processing to enhance accuracy in tasks that require multi-step logic and reasoning.

  • It outperforms leading models such as GPT-4.5 and Claude 3.5 Sonnet in areas like reasoning, coding, and STEM benchmarks, signaling a leap in foundational AI performance.

  • The model is already available to Gemini Advanced users, with integration into Vertex AI in progress, offering scalable enterprise-level deployment.

  • It supports a 1 million token context window, with 2 million tokens coming soon, allowing it to work across extensive documents, datasets, or multi-turn dialogues without loss of coherence.

  • A practical showcase demonstrated Gemini 2.5 building a playable video game from a single prompt, illustrating its applied reasoning and generative power in real-world tasks.

🧠 Gemini 2.5 marks a shift from generative AI to cognitive AI—models not just producing content but reasoning through problems. This aligns with industry trends pushing for agents that can plan, execute, and reflect, setting the stage for more autonomous, intelligent systems in tools, workflows, and research.

OpenAI has integrated its most advanced image generation capabilities directly into GPT-4o, creating a natively multimodal system that excels at producing precise, contextually grounded, and photorealistic images. This update emphasizes utility—such as rendering legible text, following detailed prompts, and visually communicating complex ideas—rather than generating only artistic or abstract visuals.

  • GPT-4o's image generation is tightly integrated with its language and reasoning abilities, allowing the model to understand, contextualize, and visually represent prompts with high fidelity.

  • The model is trained on the joint distribution of text and image data, enabling it to render detailed diagrams, infographics, and symbolic imagery that align precisely with user intent.

  • A standout feature is text rendering within images—useful for signage, labels, and visual storytelling—which previous models struggled to produce accurately.

  • GPT-4o can leverage uploaded or in-chat images as visual prompts, making it suitable for tasks like modifying diagrams, illustrating ideas, or transforming existing visuals based on user input.

  • Safety measures are in place to restrict image generation of realistic humans, minimizing misuse while allowing for creativity and communication through expressive but controlled visuals.

🖼️ By grounding image generation in practical communication tasks—like diagrams, signage, and educational visuals—OpenAI positions GPT-4o not just as a creative tool but as a productivity layer for knowledge work, signaling a shift toward AI systems that bridge expression, explanation, and execution.

In a new blog post, Simon Willison explores the release of Qwen2.5-VL-32B, a 32B vision-language model from Alibaba’s Qwen team. Willison highlights its balance of capability and efficiency, running locally on his 64GB Mac while demonstrating strong performance on image understanding tasks, including a detailed map analysis that impressed him with its accuracy and nuance.

  • Willison notes the model's practical value as a “sweet spot” size—powerful enough to approach GPT-4-level reasoning while remaining resource-efficient for local deployment.

  • The Qwen team claims improvements over its predecessors in mathematical reasoning, visual logic deduction, and alignment with human preferences, backed by selective benchmark comparisons that outperform models like Gemma 3-27B and GPT-4o-0513.

  • Willison tested the 4-bit quantized version using MLX, praising both its accessibility and the quality of the image description it produced from a detailed coastal map.

  • The example output demonstrated geographic and semantic precision, identifying protected marine areas, topographic features, and even depth contours with structured clarity.

  • Community members, like Prince Canuma, quickly released various quantized formats (4-bit to bf16), enabling rapid experimentation and lowering the barrier for local usage of large multimodal models.

🧭 Willison’s deep dive underscores a trend toward practical open-weight vision-language models that prioritize interpretability and deployability, hinting at a future where sophisticated multimodal understanding isn’t just cloud-bound but democratized across personal hardware and open ecosystems.

 🆕 Updates

📽️ Daily Demo

🗣️ Discourse