DeepMind's Gemini Robotics‑ER 1.6
DeepMind published Gemini Robotics‑ER 1.6, an embodied‑reasoning model now available through the Gemini API and Google AI Studio with reported benchmark gains on tasks such as instrument reading. Coverage describes the release as enabling robots to better interpret gauges and act in real‑world settings. (deepmind.google) (interestingengineering.com)
Robots struggle with the messy parts of the real world: reading dials, judging space, and deciding whether a task is actually finished. Google DeepMind said on April 14 it has updated its embodied-reasoning model to handle more of those jobs. (deepmind.google) Embodied reasoning is the part of robotics that links seeing to doing. Google describes Gemini Robotics-ER 1.6 as a vision-language model that takes camera views and natural-language instructions, then plans actions for a robot or calls outside tools and controllers. (ai.google.dev) (deepmind.google) Google said the new version improves on Gemini Robotics-ER 1.5 and Gemini 3.0 Flash on tests for pointing, counting, single-view and multi-view success detection, and instrument reading. The company said instrument reading was developed with Boston Dynamics and covers gauges and sight glasses that robots encounter in industrial settings. (deepmind.google) Google also said Gemini Robotics-ER 1.6 is available through the Gemini API and Google AI Studio starting April 14, with a developer Colab for setup examples. That moves the model from a research demo toward a product developers can test in their own systems. (deepmind.google) (ai.google.dev) The release builds on Google DeepMind’s March 12, 2025 launch of Gemini Robotics and Gemini Robotics-ER, two Gemini 2.0-based models aimed at physical machines. In that earlier rollout, Google split the work between a vision-language-action model for direct control and a reasoning-first model that roboticists could connect to existing controllers. (deepmind.google) That split is central to how Google is pitching the new model. DeepMind said Robotics-ER 1.6 serves as a high-level reasoning layer that can call vision-language-action models, Google Search, or third-party functions, instead of handling every motor command itself. (deepmind.google) Google said the model can use points in an image as intermediate reasoning steps, such as marking objects, comparing sizes, or identifying grasp locations before acting. The company tied those skills to tasks like moving one object to another location, counting items in clutter, and checking whether a job was completed. (deepmind.google) Google also said Robotics-ER 1.6 is its “safest robotics model to date,” citing stronger compliance with safety policies on adversarial spatial-reasoning tasks. The company did not announce a commercial robot product alongside the model, and the public evidence so far is limited to Google’s own benchmarks and examples. (blog.google) (deepmind.google) For developers, the immediate change is practical: a robotics reasoning model that can now be called through the same Gemini platform used for other Google models. For robots in factories and labs, the test will be whether better gauge-reading and scene understanding hold up outside Google’s evaluations. (ai.google.dev) (interestingengineering.com)