AGIBOT open‑sources WORLD dataset

AGIBOT released the AGIBOT WORLD 2026 dataset as an open resource for embodied AI and has staged Phase 1 imitation‑learning models on Hugging Face. (x.com) The team also published Genie Sim 3.0, a large‑scale LLM‑driven simulator and benchmark suite for instruction‑based manipulation and RL training. (x.com)

Robots learn by watching demonstrations and repeating them, much like an apprentice copying a skilled worker. AGIBOT has now released its AGIBOT WORLD 2026 robot-training dataset publicly, alongside early imitation-learning models on Hugging Face. (huggingface.co) The dataset card says AGIBOT WORLD 2026 was collected in “100% real-world environments” across commercial spaces, homes, and other general-purpose settings. AGIBOT says the data was captured on its G2 robot platform using a free-form collection process and then annotated for developers. (huggingface.co) The Hugging Face repository shows an `ImitationLearning` directory of about 2.92 terabytes, and the AGIBOT WORLD organization page lists the dataset as updated about five days ago. That same organization page also shows several AGIBOT robot models and related datasets published under the agibot-world account. (huggingface.co 1) (huggingface.co 2) Simulation is the robot equivalent of a flight simulator: it lets researchers train and test machines in software before risking time, money, or hardware in the real world. AGIBOT’s companion release, Genie Sim 3.0, is an open simulation platform that the company says covers digital asset generation, scene generalization, data collection, and automated evaluation. (github.com) (agibot-world.com) AGIBOT said on January 6, 2026, at the Consumer Electronics Show in Las Vegas that Genie Sim 3.0 was built with NVIDIA Isaac Sim and NVIDIA Omniverse. The company says the system can generate simulation scenes from natural-language instructions and evaluate models across more than 200 tasks and 100,000-plus scenarios. (agibot.com) AGIBOT and the project paper both say Genie Sim 3.0 includes more than 10,000 hours of open synthetic data tied to robot-operation scenarios. The paper, posted to arXiv on January 5, 2026, says the team tested “zero-shot” transfer, meaning policies trained in simulation were run on real robots without task-specific retraining. (agibot.com) (arxiv.org) The company is trying to connect those two bottlenecks — scarce real-world robot data and expensive physical testing — in one stack. The dataset page says AGIBOT uses digital-twin methods to build a 1:1 simulation version of a scene, while the Genie Sim materials say the simulation data is being released alongside the real-world collection. (huggingface.co) (agibot-world.com) AGIBOT says Genie Sim’s asset library includes 5,140 validated three-dimensional objects across retail, industry, catering, home, and office settings. The company also says a simulation-ready object can be created from a single 60-second orbital video, and full environments can be captured with RGB cameras, LiDAR scans, and centimeter-level positioning hardware. (github.com) (agibot.com) The release comes as robotics labs are pushing to make “embodied artificial intelligence” less dependent on bespoke, private data pipelines. AGIBOT’s Hugging Face page already hosted earlier Alpha and Beta datasets, and the 2026 release extends that open-data push with a newer real-world corpus and staged policy checkpoints. (huggingface.co 1) (huggingface.co 2) For researchers, the immediate result is straightforward: one public download for real-world demonstrations, another for simulation assets and code, and a set of early models to test against both. AGIBOT is betting that open data plus open simulation will pull more robot training out of closed labs and onto shared infrastructure. (huggingface.co) (github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.