Google DeepMind unveils AI cursor
- Google DeepMind said on May 12, 2026 it is developing an experimental AI-enabled mouse pointer that can understand on-screen context and spoken commands. - Adrien Baranes and Rob Marchant said the prototype aims to replace “text-heavy prompts” with pointing and speech, using Gemini to infer context. - Google DeepMind published demo videos and design principles on its website, while Project Astra and Gemini Live remain the named follow-on products.
Google DeepMind published a research post on May 12 outlining an experimental “AI-enabled pointer” that can understand what a user is pointing at on screen and respond to short spoken requests. The prototype, powered by Gemini, is designed to work across websites, documents and other software without forcing users to move content into a separate chat window, the company said. Google DeepMind said it is sharing “experimental demos” and the design principles behind the work, not a product launch or release date. The post was written by Adrien Baranes and Rob Marchant, two Google DeepMind researchers. ### What exactly did Google DeepMind show? Google DeepMind said the demos show a cursor that can combine pointing with speech, so a user can indicate an item on screen and say something like “Show me directions.” The company said the system is meant to infer context from what the cursor is over, rather than requiring a long written prompt. The May 12 post included examples tied to Google AI Studio, maps, PDFs, tables and recipes. (deepmind.google) In those examples, a user could point at a PDF and ask for a bullet-point summary to paste into an email, hover over a table and ask for a pie chart, or highlight recipe text and ask for ingredient quantities to be doubled, according to the post. ### How does the company say the system works? Adrien Baranes and Rob Marchant wrote that the project is built around four interaction principles. (deepmind.google) Those principles are to keep AI available where the user is already working, capture visual and semantic context around the pointer, rely on short expressions such as “fix this” or “move that here,” and let users combine speech with selection and pointing, according to the post. Google DeepMind said the system is intended to let “the computer ‘see’ and understand” which word, paragraph, image region or code block matters to the user. The company framed that as a way to reduce what it called “AI detours” between the user’s work and a separate assistant window. ### Is this a shipping product or still a research prototype? The May 12 publication described the pointer as an “experimental environment” and referred to “future user interfaces,” without announcing commercial availability. (deepmind.google) The company did not provide pricing, a waitlist for the pointer itself, or a timetable for release in the post. Project Astra remains the clearer path Google has identified for bringing related capabilities into products. (deepmind.google) Google DeepMind says on Astra’s page that it is working to bring Astra features to Gemini Live, Search and new form factors such as glasses, and lists screen sharing, video understanding, tool use and interface control among Astra’s capabilities. ### What other Google systems does this connect to? (deepmind.google) Google’s March 26 developer post on Gemini 3.1 Flash Live said the model is available in preview through the Live API in Google AI Studio for real-time voice and vision agents. Google said that model is designed for low-latency, real-time conversations and supports more than 90 languages. The pointer demos also fit with the capabilities Google describes for Project Astra, including screen understanding, proactive responses and tool use across products such as Search, Gmail, Calendar and Maps. (deepmind.google) Google did not say in the pointer post that Astra powers the cursor prototype, but the company has publicly grouped both efforts under broader work on more natural multimodal interaction. That connection is an inference based on the overlap in capabilities Google describes across the two pages. (blog.google) ### How does this fit into the broader field of AI agents for software interfaces? A December 2024 survey on arXiv described “LLM-brained” GUI agents as a fast-moving area aimed at interpreting interface elements and executing actions from natural-language instructions. The paper said these systems are being developed for web navigation, mobile apps and desktop automation, though it was a general survey of the field rather than a study of Google’s pointer. (deepmind.google) Google DeepMind’s post places its own work inside that same set of problems: how to reduce prompting overhead, identify what on-screen element the user means, and connect language, vision and action in one interface layer. The company’s next public milestones in related products are the continued rollout of Project Astra features into Gemini Live and the developer availability of Gemini 3.1 Flash Live through Google AI Studio, both of which Google has already identified on its official pages. (arxiv.org) (deepmind.google)