Baidu’s Famou‑Agent 2.0 tops MLE‑Bench

Baidu announced Famou‑Agent 2.0 and reported it reached state‑of‑the‑art performance on MLE‑Bench for ML engineering multi‑agent tasks. The post frames the release as evidence of accelerating multi‑agent frameworks in China’s AI stack (x.com).

Baidu said Famou‑Agent 2.0 now leads MLE‑Bench, a public test that measures whether artificial intelligence agents can do machine learning engineering work end to end. (github.com) MLE‑Bench is an OpenAI benchmark built from 75 Kaggle competitions, and it tests concrete tasks such as preparing data, training models, and running experiments. (openai.com) On the current public leaderboard, Famou‑Agent 2.0 is listed at 64.44% overall with a 24‑hour runtime on a February 23, 2026 entry using Gemini‑3‑Pro‑Preview. The next listed systems are AIBuildAI at 63.11% and CAIR MARS+ at 62.67%. (github.com) The same leaderboard also shows an earlier Famou‑Agent 2.0 entry at 59.56% overall using Gemini‑2.5‑Pro on December 27, 2025. That means the latest top score reflects both Baidu’s agent framework and the model configuration attached to that run. (github.com) This benchmark tracks a specific kind of artificial intelligence progress: not chat answers, but software agents that can keep trying, test their own code, and improve a pipeline the way a machine learning engineer would. OpenAI’s benchmark paper said its original best setup in 2024 reached at least Kaggle bronze‑medal level in 16.9% of competitions. (openai.com) Baidu describes Famou as a code agent for “verifiable” optimization problems, meaning tasks where the system can run code, score the result, and use that score to guide the next attempt. Baidu’s documentation says the product uses an evolutionary framework to search for better algorithms automatically. (cloud.baidu.com) In its public code repository, Baidu says the framework combines large language model reasoning with large‑scale evolutionary search and runs on distributed infrastructure built on Ray. The repository also says the system has been tested in machine learning, operations research, graphics processing unit kernel optimization, and mathematics tasks. (github.com) The leaderboard entry does not make Famou a stand‑alone model winner in the usual sense, because MLE‑Bench ranks agent systems together with the language models used inside them. The current top Famou run is explicitly labeled as powered by Gemini‑3‑Pro‑Preview. (github.com) Baidu’s own materials present Famou as a commercial product inside Baidu AI Cloud, not just a lab demo. That fits a wider shift in China’s artificial intelligence market toward agent frameworks that wrap foundation models with tools, memory, evaluators, and automated search. (cloud.baidu.com) For now, the clearest public fact is narrower than the marketing: Famou‑Agent 2.0 sits at the top of the current MLE‑Bench leaderboard, and the result puts more attention on agent systems that can write, test, and revise machine learning workflows on their own. (github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.