New Framework Released for Benchmarking On-Device AI Models
A new benchmarking framework called MobileAIBench has been released to evaluate large language and multimodal models on devices like smartphones and embedded systems. The framework is designed to measure real-world performance metrics beyond raw speed, including memory footprint, power consumption, and user-perceived latency. This enables a more systematic approach to selecting models that fit the constraints of edge hardware.
- The framework was developed by a team at Salesforce AI Research. - MobileAIBench is an open-source, two-part framework that includes a desktop library for evaluations and a mobile app for measuring on-device latency and hardware use. - In addition to latency, the framework specifically measures device-oriented metrics such as CPU and RAM usage, as well as Battery Drain Rate (BDR). - A primary goal of the research is to analyze the impact of quantization—a model compression technique—on task performance, trust, and safety for models with up to 7 billion parameters. - The on-device app is designed to test specific quantized models in the `.gguf` format, including TinyLlama-1.1B, Phi-2, Gemma-2B, and the multimodal model Llava-Phi-2. - This initiative is part of a broader industry trend to leverage the dedicated Neural Processing Units (NPUs) found in modern mobile processors like Apple's A-series and Qualcomm's Snapdragon chips. - Running AI models on-device offers significant advantages for embedded applications by enhancing user privacy, providing greater stability independent of network connectivity, and allowing for deeper personalization. - While other tools like Geekbench AI and MLPerf Client benchmark AI on various hardware, MobileAIBench is distinguished by its focus on task-specific evaluation for both large language and multimodal models specifically on mobile systems.