Alibaba Model Beats Google, NVIDIA

Published by The Daily Scout

What happened

Alibaba's new embodied intelligence foundation model, RynnBrain, has reportedly broken 16 embodied AI benchmarks. Social media users noted that its performance in spatial understanding and motion prediction surpasses that of Google's Gemini and NVIDIA's Cosmos models.

Why it matters

- RynnBrain was developed by Alibaba's DAMO Academy and is led by Zhao Deli, the head of the Embodied Intelligence Lab. The model is built upon Alibaba's Qwen3-VL vision-language model and was optimized using a self-developed architecture called RynnScale, which reportedly doubled the training speed. - The model's core technical innovations are "spatiotemporal memory" and "physical-space reasoning." Spatiotemporal memory allows the robot to recall the historical state of objects, enabling it to resume interrupted tasks, while physical-space reasoning uses a combination of textual and spatial cues to ground its understanding in the real world. - Alibaba has released several open-source versions of RynnBrain, including models with 2 billion and 8 billion parameters, as well as a 30 billion-parameter Mixture-of-Experts (MoE) variant. The 30B MoE model is notably efficient, requiring only 3 billion active parameters during inference to outperform models with 72 billion parameters. - In addition to the foundational models, three specialized, post-trained versions were also released: RynnBrain-Plan for robot task planning, RynnBrain-Nav for vision-language navigation, and RynnBrain-CoP for chain-of-point reasoning. - To address a lack of detailed evaluation methods in the industry, DAMO Academy also introduced a new open-source benchmark called RynnBrain-Bench. This benchmark is designed to assess fine-grained spatiotemporal embodied AI tasks across four dimensions: object cognition, spatial cognition, grounding, and pointing. - A demonstration video titled "RynnBrain's Housework Diary" showcases the model's capabilities in a domestic setting. In the video, a robot performs tasks such as arranging tableware according to specific instructions, selecting a specific number of oranges from a bowl of mixed fruit, and retrieving items from a refrigerator.

Key numbers

  • Alibaba's new embodied intelligence foundation model, RynnBrain, has reportedly broken 16 embodied AI benchmarks.
  • The model is built upon Alibaba's Qwen3-VL vision-language model and was optimized using a self-developed architecture called RynnScale, which reportedly doubled the training speed.
  • Alibaba has released several open-source versions of RynnBrain, including models with 2 billion and 8 billion parameters, as well as a 30 billion-parameter Mixture-of-Experts (MoE) variant.
  • The 30B MoE model is notably efficient, requiring only 3 billion active parameters during inference to outperform models with 72 billion parameters.

What happens next

  • In addition to the foundational models, three specialized, post-trained versions were also released: RynnBrain-Plan for robot task planning, RynnBrain-Nav for vision-language navigation, and RynnBrain-CoP for chain-of-point reasoning.

Quick answers

What happened in Alibaba Model Beats Google, NVIDIA?

Alibaba's new embodied intelligence foundation model, RynnBrain, has reportedly broken 16 embodied AI benchmarks. Social media users noted that its performance in spatial understanding and motion prediction surpasses that of Google's Gemini and NVIDIA's Cosmos models.

Why does Alibaba Model Beats Google, NVIDIA matter?

RynnBrain was developed by Alibaba's DAMO Academy and is led by Zhao Deli, the head of the Embodied Intelligence Lab. The model is built upon Alibaba's Qwen3-VL vision-language model and was optimized using a self-developed architecture called RynnScale, which reportedly doubled the training speed. The model's core technical innovations are "spatiotemporal memory" and "physical-space reasoning." Spatiotemporal memory allows the robot to recall the historical state of objects, enabling it to resume interrupted tasks, while physical-space reasoning uses a combination of textual and spatial cues to ground its understanding in the real world. Alibaba has released several open-source versions of RynnBrain, including models with 2 billion and 8 billion parameters, as well as a 30 billion-parameter Mixture-of-Experts (MoE) variant. The 30B MoE model is notably efficient, requiring only 3 billion active parameters during inference to outperform models with 72 billion parameters. In addition to the foundational models, three specialized, post-trained versions were also released: RynnBrain-Plan for robot task planning, RynnBrain-Nav for vision-language navigation, and RynnBrain-CoP for chain-of-point reasoning. To address a lack of detailed evaluation methods in the industry, DAMO Academy also introduced a new open-source benchmark called RynnBrain-Bench. This benchmark is designed to assess fine-grained spatiotemporal embodied AI tasks across four dimensions: object cognition, spatial cognition, grounding, and pointing. A demonstration video titled "RynnBrain's Housework Diary" showcases the model's capabilities in a domestic setting. In the video, a robot performs tasks such as arranging tableware according to specific instructions, selecting a specific number of oranges from a bowl of mixed fruit, and retrieving items from a refrigerator.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.