New Benchmark for On-Device AI Released
Salesforce AI Research has developed MobileAIBench, a new benchmarking suite for systematically evaluating large language and multimodal models on consumer hardware. The suite tests model performance on metrics like latency, power consumption, and accuracy for tasks involving text, images, and speech. The results are intended to help developers understand the trade-offs between cloud-based performance and the privacy and speed of local, on-device inference.
- MobileAIBench is the first open-source framework designed specifically for testing the task-specific performance of large language and multimodal models on mobile devices. It consists of a desktop evaluation library and an iOS app for measuring latency and hardware utilization on a real iPhone. - The benchmark's on-device testing was conducted on an iPhone 14, measuring metrics like CPU and RAM usage, battery drain, and time-to-first-token. This focus on real-world hardware provides developers with practical insights into how models will perform in users' hands. - A key finding from the research is that while quantization is effective at reducing model size, extreme quantization to 3-bits can lead to a significant drop in accuracy without a corresponding improvement in inference latency. For multimodal models, performance generally remains consistent until the 3-bit quantization level, with some models like Moondream2 showing surprising robustness. - The research highlights a clear trade-off between a model's accuracy and its disk usage, with a linear trend showing that higher accuracy generally correlates with larger model sizes. The benchmark helps developers identify models that offer the best balance of performance and size for their specific needs. - While other benchmarks like MLPerf Mobile evaluate the performance of mobile AI, MobileAIBench is unique in its focus on the impact of quantization on task-specific accuracy and its comprehensive analysis of both performance and resource consumption on real devices. - The project is led by researchers at Salesforce AI Research, including Silvio Savarese, an Executive VP and Chief Scientist at Salesforce. This work is part of Salesforce's broader strategy to advance AI and empower developers with the tools needed to build the next generation of AI-powered CRM applications. - For engineers interested in the practical application of AI, MobileAIBench provides reproducible evaluations for a range of open-source models, including TinyLLaMA, Phi-2, Gemma, and StableLM-Zephyr. This allows developers to assess the viability of these models for their own on-device applications. - The development of MobileAIBench underscores the growing importance of on-device AI for enhancing user privacy, enabling offline functionality, and improving application responsiveness. As the field of AI continues to evolve, we can expect to see a greater emphasis on optimizing models for resource-constrained environments.