Humyn Labs uses 20,000 hours

- Humyn Labs said on May 24 it is using more than 20,000 hours of first-person household video with KGeN to train humanoid AI systems. - The corpus spans laundry, dishwashing and cooking footage from more than 30 countries, recorded in homes to capture hand movements, objects and context. - Humyn Labs has published a case study on the dataset, while related egocentric-robotics work continues from Figure and academic groups.

Humyn Labs is pushing a simple idea that robotics companies have been chasing for years: if humanoids are supposed to work in homes, they need to learn from what homes actually look like. The company says it has assembled more than 20,000 hours of first-person household video, gathered with KGeN, to train models on chores such as laundry, dishwashing and cooking. The footage is meant to capture the visual perspective, hand motion and object context that robots would face outside staged lab settings. The announcement lands as a wider robotics field is putting more weight on egocentric human data as a way to improve robot learning in messy real environments. ### Why are companies collecting first-person household video at this scale? Humyn Labs says the dataset was captured from head-mounted smartphones while participants performed everyday household tasks in their own homes. In its case study, the company described the corpus as continuous first-person footage designed to reflect “the exact perspective humanoid systems require for task learning.” (humynlabs.ai) The company’s broader pitch is that internet-scale text and video do not contain enough structured data about real-world physical actions. Moneycontrol reported on April 13 that co-founder Manish Agarwal said “there is no internet data for real-world actions,” and that Humyn Labs was building data pipelines for physical AI across India, Southeast Asia, Latin America and the Middle East. (humynlabs.ai) ### What does KGeN do in this arrangement? KGeN is tied to Humyn Labs through both infrastructure and founders. Moneycontrol reported that Agarwal and Ishank Gupta earlier co-founded KGeN in 2023, and that KGeN is building a blockchain-based verified distribution protocol to help companies reach and verify real users. (moneycontrol.com) An Outposts report on the partnership said KGeN was powering Humyn Labs’ “Proof of Expert” system, which validates and tracks human expertise used in AI model training. That matters because collecting household video at scale requires recruiting contributors, checking task quality and linking recordings to reliable worker histories. (moneycontrol.com) ### Why does egocentric data matter more than a normal video dataset? Figure AI made a similar argument in a September 2025 post about its “Project Go-Big” humanoid pretraining effort. Figure said robotics lacks a large-scale equivalent to ImageNet or Wikipedia, and argued that humanoids can learn directly from everyday human video because their perspectives and kinematics resemble our own. (outposts.io) Academic work is moving in the same direction. An April 2026 arXiv paper for the EgoLive dataset said robot learning is constrained by a shortage of large, high-quality datasets and argued that human egocentric video offers a scalable, in-the-wild alternative to teleoperation-heavy collection methods. ### How large is 20,000 hours in this corner of robotics? Ego4D, one of the best-known first-person video research datasets, contains 3,670 hours of video across 9 countries, according to the project site. (figure.ai) Humyn Labs’ claimed 20,000-hour corpus is therefore much larger in raw duration, though it is targeted at household task learning rather than serving as a broad public academic benchmark. Scale alone does not settle usefulness. (arxiv.org) The value for robot training depends on annotation quality, task diversity, camera consistency, hand visibility and whether the data can be paired with action labels or robot learning pipelines, according to the EgoLive paper’s discussion of dataset design. ### What household tasks are companies actually trying to teach? Figure said Helix had previously focused on upper-body manipulation tasks including laundry folding, dishwasher loading and package reorientation before extending into navigation. (ego4d-data.org) Humyn Labs’ own description centers on chores such as laundry, dishes and cooking, which are difficult because they involve deformable objects, clutter, sequencing and changing room context. (arxiv.org) Humyn Labs has posted a case study describing the 20,000-hour corpus, and its founders said in April they had committed $20 million to expand data collection operations globally. That expansion, and any future customer disclosures, will be the next concrete markers of whether this dataset moves from collection into deployed humanoid training programs. (humynlabs.ai) (figure.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.