Gemini Robotics, Gemini 2.0-based model designed for robotics, Google DeepMind Technologies Limited, London, United Kingdom


Gemini Robotics: Bringing AI to the physical world

Mar 12, 2025

Our new Gemini Robotics model brings Gemini 2.0 to the physical world. It's our most advanced vision language action model, enabling robots that are interactive, dexterous, and general. Learn more about how we're enabling the next generation of robotic AI agents at deepmind.google/robotics

"Gemini Robotics brings AI into the physical world"

by Carolina Parada
March 12, 2025
 

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

Sep 4, 2025

Discover EmbeddingGemma, a state-of-the-art 308 million parameter text embedding model designed to power generative AI experiences directly on your hardware. Ideal for mobile-first Al, EmbeddingGemma brings powerful capabilities to your applications, enabling features like semantic search, information retrieval, and custom classification – all while running efficiently on-device.In this video, Alice Lisak and Lucas Gonzalez from the Gemma team introduce EmbeddingGemma and explain how it works. Learn how you can run this model on less than 200MB of RAM with quantization, customize its output dimensions with Matryoshka Representation Learning (MRL), and build powerful offline Al features.

"Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings"

by Min Choi, Sahil Dua, Alice Lisak
September 4, 2025
 
Last edited:

Gemini Robotics 1.5: Enabling robots to plan, think and use tools to solve complex tasks

Sep 25, 2025

We’re powering an era of physical agents with Gemini Robotics 1.5 — enabling robots to perceive, plan, think, use tools and act to better solve complex, multi-step tasks.

🤖 Gemini Robotics 1.5 is our most capable vision-language-action (VLA) model that turns visual information and instructions into motor commands for a robot to perform a task. This model thinks before taking action and shows its process, helping robots assess and complete complex tasks more transparently. It also learns across embodiments, accelerating skill learning.

🤖 Gemini Robotics-ER 1.5 is our most capable vision-language model (VLM) that reasons about the physical world, natively calls digital tools and creates detailed, multi-step plans to complete a mission. This model now achieves state-of-the-art performance across spatial understanding benchmarks.

We’re making Gemini Robotics-ER 1.5 available to developers via the Gemini API in Google AI Studio and Gemini Robotics 1.5 to select partners.
 

Fall 2025 GRASP on Robotics - Jie Tan, Google DeepMind

Nov 26, 2025

“Gemini Robotics: Bringing AI into the Physical World”
ABSTRACT
Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. In this talk, I will present Gemini Robotics, an advanced Vision-Language-Action (VLA) generalist model capable of directly controlling robots. Gemini Robotics executes smooth movements to tackle a wide range of complex manipulation tasks while also being robust to variations in object types and positions, handling unseen environments as well as following diverse, open vocabulary instructions. With additional fine-tuning, Gemini Robotics can be specialized to new capabilities including solving long-horizon, highly dexterous tasks, learning new short-horizon tasks from as few as 100 demonstrations and adapting to completely novel robot embodiments. Furthermore, I will discuss the challenges, learnings and future research directions on robot foundation models.

PresenterJie Tan is a Senior Staff Research Scientist and Tech Lead Manager in the robotics team of Google DeepMind. His research focuses on building foundation models and deep reinforcement learning methods to robots, with interests spanning locomotion, navigation, manipulation, simulation, and sim-to-real transfer. Jie Tan is also an adjunct associate professor at Georgia Institute of Technology. He got his PhD at the Computer Graphics Laboratory in Georgia Tech, advised by Greg Turk and Karen Liu.
 
Back
Top