Sergey Brin on the Future of AI & Gemini

TLDR;

This video features a conversation with Sergey Brin, focusing on Google's AI advancements showcased at I/O. The discussion covers initial reactions to the announcements, the focus on Gemini's core text model, the integration of native audio in Gemini and Veo, and the surprising interpretability of current AI models. Brin shares insights from model training runs, reflects on how current AI developments compare to past expectations, and discusses the evolution of model training techniques. The conversation also touches on the future of reasoning and Deep Think, Google's startup culture, and the acceleration of AI innovation.

Gemini's core text model is the primary focus for enabling self-improvement and better AI science.
Native audio integration in Gemini and Veo significantly enhances the realism and usability of generated media.
Post-training is becoming increasingly important in model development, facilitating tool use and other advanced capabilities.
Google's culture fosters innovation, allowing it to quickly adapt and lead in the AI domain.

Initial reactions to I/O [0:20]

The initial sentiment surrounding Google I/O was overwhelmingly positive, both externally and within the company. A significant amount of progress has been made across various models and products. Even those deeply involved, like Sergey Brin, were surprised by some of the announcements, such as the virtual fit feature in Google Search. The challenge now is to ensure the smooth delivery and implementation of these new features, allowing users to fully explore and understand their capabilities.

Focus on Gemini’s core text model [2:00]

Sergey Brin is primarily focused on Gemini's core text model because he believes it is fundamental for enabling self-improvement and advancing the science behind AI. While generative media like images and videos are impressive, the core text model is crucial for coding and developing AI further. Brin relies on Gemini for coding and math tasks, highlighting its growing utility.

Native audio in Gemini and Veo 3 [4:29]

The integration of native audio in Veo 3 is a significant advancement, making video generation more practical and less gimmicky. The addition of audio enriches the perception and makes the generated content more compelling. Native audio support has been in Gemini's base model for about a year, but it took time to refine and ship it. Vio uses diffusion-based audio generation, similar to its video generation process.

Insights from model training runs [8:34]

Watching the training runs of models like Veo provides insights into their development. Intermediate checkpoints allow developers to assess the model's trajectory and progress. Testing these checkpoints helps gauge the model's performance and potential early in the training process.

Surprises in current AI developments vs. past expectations [10:07]

Current AI developments are happening faster than expected, potentially making past predictions conservative. The fact that language models are leading the way in AI development was not necessarily foreseen 15 years ago. The interpretability of these thinking models is also surprising, allowing insights into their reasoning processes. This interpretability offers a degree of comfort from a safety standpoint.

Evolution of model training [14:20]

The architecture of different models, including video diffusion models like Veo, shares a surprising amount of similarity, often using transformers at their core. Tool use and other advanced capabilities are primarily added during post-training, which is becoming an increasingly significant part of the overall training process. Post-training, including reinforcement learning, is evolving from a small shaping step to a more substantial phase that greatly enhances the model's power.

The future of reasoning and Deep Think [16:40]

Deep Think represents a convergence of different approaches to reasoning, yielding stronger results. Allowing models to think for extended periods, such as hours or days, can significantly improve the quality of answers to complex questions. This capability is new and valuable, similar to the advancement of long context for input. The challenge lies in enabling models to generalize and develop solutions over extended periods, moving beyond short, simple tasks.

Google’s startup culture and accelerating AI innovation [20:19]

Companies need to periodically reinvent themselves, and Google is well-equipped to lead in the AI domain due to its history with large-scale data analysis and machine learning. The launch of 2.5 Pro marked a significant leap forward, and subsequent iterations like 2.5 Flash have further solidified Google's position. The rapid pace of innovation and the company's underlying science engine are driving this momentum. Google feels like a startup, fostering rapid innovation and progress.

Closing [24:51]

The discussion concludes with appreciation for the team's hard work and a gift presentation: a TPU V4, symbolizing the hardware that powers Google's AI advancements.

Watch the Video

Date: 5/24/2025 Source: www.youtube.com