Oppo X-OmniClaw open-source phone agent

- Oppo’s Multi-X team published X-OmniClaw code on GitHub on May 18, 2026, releasing an Android agent that runs on-device across apps. - The GitHub repository says X-OmniClaw uses “on-screen UI state, real-world visual context, and audio input” and is licensed under Apache-2.0. - The code, technical report and demos are now available through Oppo’s GitHub repository and project page.

Oppo’s Multi-X team has released X-OmniClaw as an open-source Android agent that is designed to run on a physical phone rather than inside a cloud-hosted virtual device, according to the project’s GitHub repository and technical report. The code appeared publicly on GitHub on May 18, 2026, under an Apache-2.0 license, and the accompanying paper describes a system that combines perception, memory and action for tasks across Android apps. Decrypt reported the release on May 18, describing it as a phone agent that can see, hear and act without routing the phone through the cloud. The repository says the system works on “physical Android devices” and performs “cross-app operations through on-device tools.” ### What exactly did Oppo release? GitHub shows the project under the name “OPPO-Mente-Lab/X-OmniClaw,” with the repository labeled public and the latest visible initialization commits posted about 13 hours before it was crawled. The repository describes X-OmniClaw as “an edge-native Multimodal Android Agent” that integrates “multimodal perception, memory, and action.” (github.com) The technical report lists authors from the Multi-X Team at OPPO AI Center, including Xiaoming Ren, Ru Zhen, Yanhao Zhang and Haonan Lu. The paper says the system is a “unified mobile agent” for Android and positions it as a response to demand for mobile-based personal agents that can handle “complex and intuitive interactions.” ### How is this different from a cloud phone agent? (github.com) The GitHub README says X-OmniClaw “operates independently of virtual environments” and functions “directly on physical Android devices.” The repository adds that it captures “real-time visual telemetry” and executes “native touch interactions,” which means the agent is meant to work against the live phone interface rather than a mirrored cloud instance. (arxiv.org) The project page says the architecture is “edge-native on Android,” with “on-device execution” and “cloud LLM reasoning.” That wording is narrower than a blanket no-cloud claim: Oppo’s materials describe core execution on the device, while reserving cloud language-model use for reasoning in the architecture summary. ### What inputs does the system use? (github.com) The repository says “Omni” refers to three sensing domains: “on-screen UI state, real-world visual context, and audio input.” The technical report uses similar language, saying Omni Perception integrates UI states, real-world visual contexts and speech inputs into a unified pipeline. The project page’s demos show how those inputs are supposed to work in practice. (eggplant95.github.io) One demo describes a user asking, “How much is this bottle of water on Taobao?” and shows camera-plus-voice input leading to a search and price comparison. Another demo describes a “screen companion” that follows the active screen, accepts push-to-talk input and performs multi-step execution with live feedback. (github.com) ### How does it remember and repeat actions? The paper says Omni Memory combines working memory for task continuity with long-term personal memory “distilled from local data.” The same report says Omni Action uses a hybrid grounding strategy that combines XML metadata with visual perception for app interaction. The most concrete mechanic is in the paper’s description of “Behavior Cloning and Trajectory Replay.” Oppo says the system can capture user navigation as reusable skills, allowing “precise direct-access execution.” The project page illustrates that with a demo labeled “Instant portal to a Meituan flash-sale page,” which it describes as behavior cloning. (eggplant95.github.io) (arxiv.org) ### Where can developers inspect it now? The public materials are already live in three places: Oppo’s GitHub repository, the arXiv-hosted technical report and a project page with demos. GitHub also lists recent updates dated March 25, March 31, April 20 and April 22, including scheduled automation, a local speech-vision loop and tighter execution policies, suggesting the codebase was under active development before the public release. (arxiv.org) (github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.