Discover Google's new AI updates: Enhanced Gemini 1.5 Pro, speedy Gemini Flash, Veo video model for innovative content creation, and customizable Google Gems.
After OpenAI's Spring Update yesterday, everyone was waiting to see how Google would respond the next day during their Google I/O 2024 developer conference. And here it is - the wait is over! Let's explore these striking updates.
Improved Gemini 1.5 Pro and New Gemini 1.5 Flash
Google has rolled out enhancements to its Gemini AI assistant with the introduction of Gemini 1.5 Pro and the new Gemini 1.5 Flash model.
The Gemini 1.5 Pro has received various quality improvements to boost its performance in translation, coding, reasoning, and other key areas, effective from today. These updates are designed to help users handle more complex tasks with ease.
Gemini 1.5 Flash is a more compact model tailored for tasks that require quick response times, ideal for high-frequency or specific needs.
Both models are currently available in preview in over 200 countries and territories, with a general release scheduled for June. They feature a 1 million token context window that supports multimodal inputs, including text, images, audio, and video. For those interested in an even more advanced experience, a 2 million token context window for the 1.5 Pro is accessible via a waitlist at Google AI Studio or through Vertex AI for Google Cloud customers.
Gemini 1.5 Pro Capabilities
This new version will be available to Gemini Advanced subscribers in over 150 countries and supports more than 35 languages. Gemini 1.5 Pro is designed to handle complex tasks more efficiently and includes several new features to boost productivity.
One and Two Million Tokens
One significant upgrade is the expansion of its data analysis capacity with a 1 million token context window. This allows Gemini to analyze large documents, summarize lengthy emails, and will soon help process video content and complex codebases. Additionally, users can now upload documents directly from Google Drive or their devices, making it easier to gather insights while keeping their data private. 2 million token context window is available via waitlist.
Understanding Images & Voice
The update also improves Gemini's ability to understand images. For example, users can get recipes from pictures of dishes or solve math problems from photos. Gemini Live, another new feature, enhances real-time conversations with the AI, offering natural interactions and various voice options.
Integration with Google Search, Maps, Calendar, and Gmail
For travelers, Gemini 1.5 Pro simplifies trip planning by integrating flight and hotel data from Gmail with recommendations from Google Maps and Search to create customized itineraries.
Comparing the token capacity of top AI models to Gemini 1.5's impressive 2 million token capability.
The improved Gemini 1.5 Pro is now available in Gemini Advanced and for developers. In the private preview for developers, a version with a context of two million tokens will be available.
New Gemini Gemma and Nano Models
Google has expanded its Gemma AI series with new additions, introducing PaliGemma and announcing the upcoming launch of Gemma 2.
PaliGemma: Available today, this is the first open vision-language model in the Gemma lineup, designed specifically for tasks such as image captioning, visual Q&A, and other image labeling activities. PaliGemma adds to the diversity of the Gemma family, which includes specialized models like CodeGemma for programming tasks and RecurrentGemma for sequential data processing.
Gemma 2: Set to launch in June, Gemma 2 represents the next generation in the Gemma series. This model is tailored to meet the demands of developers and researchers who need powerful yet manageable AI tools. The Gemma 27B model, part of this new release, offers industry-leading performance. It is more efficient than some larger models, capable of running smoothly on standard GPUs or a single TPU host through Vertex AI, making it a practical choice for a wide range of applications.
Gemini Nano in Chrome
Google recently announced plans to include Gemini Nano, its smallest AI model, directly into the Chrome desktop client starting with Chrome 126. This move allows developers to use the on-device model to enhance their own AI features. For example, Google intends to use this capability to improve tools like the “help me write” feature in Gmail from Workspace Lab.
The ability to run these models efficiently on a variety of hardware comes from recent improvements in WebGPU and WASM support within Chrome. Jon Dahlke, Google’s director of product management for Chrome, mentioned that Google is also working with other browser vendors to potentially introduce a similar feature in their products. Dahlke stated, "With WebGPU, WASM, and Gemini built into Chrome, we believe the web is AI-ready."
While it's unlikely that all Chrome competitors will rely solely on Google’s AI models, the aim is to let browsers and developers choose and run their preferred models. Google's strategy includes adding several high-level APIs in Chrome that will allow for translating, captioning, and transcribing text directly in the browser using its Gemini models.
Veo Video Generator – a Competitor to Sora
Google has introduced Veo, a new text-to-video AI model designed to transform how we create videos. Google's new video generation model offers remarkable features and broad accessibility, making it an exciting development for various users.
Here’s a detailed look at its key capabilities:
Superior Video Quality and Styles:
– Resolution: Veo generates videos in full 1080p HD, ensuring clear, crisp visuals.
– Styles: It supports a wide range of cinematic styles, allowing users to create everything from documentary-style footage to animated clips.
Precision in Creative Control:
– Prompt Interpretation: Veo excels in understanding detailed prompts, precisely interpreting the tone and nuance required.
– Cinematic Effects: Users can direct Veo to produce specific effects like time lapses or dynamic aerial views, adding a professional touch to the videos.
User-Friendly for Video Creators:
– Broad Accessibility: Veo is designed to cater not only to experienced filmmakers but also to novices and educators, making advanced video production accessible to a wider audience.
– Educational Use: Educators can use Veo to create engaging content for teaching, making complex subjects more understandable through visual storytelling.
Upcoming Features and Integrations:
– VideoFX Access: Some of Veo’s features will soon be available through VideoFX, a new tool from Google's labs that allows creators to experiment with advanced video effects.
– Expansion to Google Products: Google plans to integrate Veo’s capabilities into YouTube Shorts and other platforms, broadening the creative possibilities for users.
With its blend of high-quality output, detailed prompt response, and user-friendly interface, Veo is set to redefine video production across various platforms and uses.
Google Gems
Lastly, subscribers can now create personalized versions of Gemini, called Gems, tailored for specific tasks like cooking, exercise, coding, or writing. Google Gems is the equivalent of OpenAI's GPT Assistants for Gemini. As usual, there is no immediate release; they will be available in a few months.
This upgrade also includes expanded integration with Google apps like YouTube Music and soon, Google Calendar, Tasks, and Keep, making everyday tasks more manageable.
Summing Up
In response to OpenAI's latest update, Google has unveiled a series of improvements and new models at the Google I/O 2024 developer conference. These include the improved Gemini 1.5 Pro and Gemini 1.5 Flash. They boost AI abilities in translation, coding, and more. This makes tough tasks easier for users worldwide.
The launch of Gemini Nano into the Chrome desktop client represents a big step forward. It blends AI directly into web browsers, creating a more interactive web experience. Additionally, Google's new video model, Veo, and the growth of the Gemma AI series with PaliGemma and the soon-to-launch Gemma 2, highlight Google's commitment to advancing AI technology across different platforms.
We can say that Google has responded well to OpenAI's Spring Update from yesterday. The AI race is still on and more intense than ever.
Author:
Artem Vysotsky
May 14, 2024