New Benchmark for On-Device AI Models Released

A new benchmarking suite called MobileAIBench has been introduced to evaluate large language and multimodal models for on-device applications. The benchmark assesses accuracy, latency, memory footprint, and energy consumption, which are critical metrics for resource-constrained aerospace systems. Early results confirm that techniques like quantization and pruning are essential for running advanced AI models on embedded hardware.

- MobileAIBench was developed by researchers from Salesforce AI Research to provide a standardized way to evaluate open-source LLMs and Large Multimodal Models (LMMs) specifically for mobile performance. - The framework consists of two main parts: a desktop library for running evaluations on tasks like NLP and trust & safety, and a mobile app for both iOS and Android to measure on-device hardware utilization. - Specific open-source models tested in the benchmark include TinyLlama-1.1B, Phi-2, Gemma-2B, StableLM-Zephyr-3B, and the multimodal model Llava-Phi-2, all in the `.gguf` format. - Beyond latency and memory, the on-device app is designed to capture detailed hardware metrics, including CPU, RAM, GPU, and thermal state, using platform-specific tools like Apple's Instruments. - The benchmark goes beyond traditional NLP accuracy metrics to explicitly include evaluations for trust and safety, a critical consideration for models deployed in sensitive applications. - While other benchmarks have focused on the accuracy impact of quantization, MobileAIBench is designed to reveal the practical deployment challenges by measuring mobile-specific metrics like battery drain on real-world devices. - The approach of combining pruning and quantization, which MobileAIBench evaluates, has been shown in other applications to reduce model size by up to 75% and power consumption by 50% while maintaining over 95% accuracy. - The underlying evaluation framework for the mobile app relies on the `llama.cpp` inference engine, a popular choice for running LLMs efficiently on consumer hardware using C/C++.

New Benchmark for On-Device AI Models Released

Get your own daily briefing