Ex‑DeepMind team bets on visual AI
Former DeepMind researchers launched Elorian to tackle visual prompts, arguing current large models struggle to ‘make sense’ of visual input. (bloomberg.com) Investors and founders pitch visual AI as the next layer for scene understanding — a capability that could later be applied to occupancy analytics or scene adaptation in smart buildings, even though the technology is still immature. (siliconvalley.com)
Most artificial intelligence can look at a photo and name objects, but it still often misses the actual situation, like a tourist who can point at a stove and a pan but cannot tell that dinner is burning. Former Google DeepMind researcher Andrew Dai says that gap is big enough to build a company around, and his new startup Elorian launched on April 9, 2026. (bloomberg.com) Elorian is based in Palo Alto, and it came out of stealth with $55 million in funding at a reported $300 million valuation. Bloomberg and Tech in Asia both say the backers include Menlo Ventures, Altimeter Capital, Striker Venture Partners, Nvidia, and Google chief scientist Jeff Dean. (bloomberg.com) (techinasia.com) The bet is not on prettier image captions. Elorian says current large models still struggle to reason across images, video, audio, and text at the same time, which is the difference between spotting a chair in a room and understanding that three people are waiting, one seat is blocked, and a fourth person just walked in. (bloomberg.com) (techmeme.com) That problem has become more visible as chatbots learned to talk first and see later. A lot of today’s systems still bolt image tools onto language models, while Elorian is pitching models built from the start to handle several kinds of data together. (pressvia.com) (houdao.com) The founders are selling experience as much as code. Dai spent about 14 years at Google and DeepMind, Yinfei Yang worked on artificial intelligence research at both Google and Apple, and Bloomberg says cofounder Seth Neel is a former Harvard professor focused on data and artificial intelligence. (bloomberg.com) (sites.google.com) Elorian does not have revenue yet, and Dai told Bloomberg the company plans to release its first public reasoning model in about 12 months. That means investors are funding a team and a thesis before there is a product in customers’ hands. (bloomberg.com) The thesis fits a wider push toward what many investors now call physical-world or multimodal artificial intelligence. Instead of answering questions about documents, these systems are supposed to interpret rooms, machines, traffic, shelves, and people moving through space. (techinasia.com) (houdao.com) One place that shows up fast is buildings. Vision-based occupancy systems already count people from camera feeds for energy control, safety, and operations, and researchers are still working on accuracy problems caused by camera placement, changing rooms, and messy real-world conditions. (github.com) (viso.ai) (sciencedirect.com) That is why “scene understanding” keeps coming up in pitches. If a model can tell not just that a room contains people but that a meeting ended, a hallway is backing up, or a floor is empty enough to dial down heating and cooling, it moves from counting bodies to adapting the building itself. (nature.com) (tiaonline.org) The catch is that this market is still early, and the hard part is reliability, not demos. Recent building research still describes real-time occupancy detection as difficult in live environments, which is another way of saying Elorian is chasing a problem that is real, expensive, and nowhere near solved. (arxiv.org) (sciencedirect.com)