Arena.ai ranks Muse Spark
Arena.ai’s evaluation put Meta’s Muse Spark near the top of recent multimodal releases, ranking it tied for #3 in text tasks and #2 in vision tasks—Meta’s strongest model showing since early 2025. The evaluation highlighted Muse Spark’s strengths across coding and business tasks. (x.com)
Arena.ai’s latest leaderboard put Meta’s new Muse Spark near the top of recent multimodal models days after Meta unveiled it on April 8. (arena.ai) (about.fb.com) Arena’s text leaderboard shows Muse Spark tied for No. 3 overall, and its vision leaderboard shows the model at No. 2 overall. Arena’s help documentation says those rankings are based on human preference votes from head-to-head comparisons between model outputs. (arena.ai 1) (arena.ai 2) (help.arena.ai) Meta said Muse Spark is the first model from Meta Superintelligence Labs and that it already powers the Meta Artificial Intelligence app and website. Meta also said the model will roll out to WhatsApp, Instagram, Facebook, Messenger, and its artificial-intelligence glasses in the coming weeks. (about.fb.com) Multimodal models handle text and images in the same system, so the same model can answer a coding question, read a chart, or describe a photo. Arena splits those capabilities into separate leaderboards for text and vision, then breaks text down further into categories such as coding and business. (arena.ai 1) (arena.ai 2) (arena.ai 3) That matters for Meta because the company spent much of 2025 ceding benchmark attention to OpenAI, Google, and Anthropic while rebuilding its model organization. Meta said Muse Spark is an “early data point” for a new model series and that larger models are still in development. (about.fb.com) Arena’s category pages also point to the kinds of tasks where Muse Spark is competing most directly. The text leaderboard includes dedicated views for coding and for business, management, and financial operations, the same areas Arena highlighted in its post about the model’s showing. (arena.ai 1) (arena.ai 2) Arena is not a lab-run benchmark in the old style, where one company publishes a fixed test set and grades itself against it. Its rankings come from users comparing anonymous model answers, a format that has gained influence because it measures how people prefer outputs in live use. (help.arena.ai) (arena.ai) That method also has limits. Arena lets users filter by category, price, license type, and prompt difficulty, so a model’s placement can shift depending on which slice of the leaderboard a reader is looking at. (arena.ai) (arena.ai) For Meta, the immediate test is not the screenshot of a leaderboard but whether Muse Spark holds up as it reaches more products and more users. The company is already using it in Meta Artificial Intelligence, and Arena’s rankings gave it an early public result to point to. (about.fb.com) (arena.ai)