Hardware-Software Co-Optimization for Edge AI
"The best results come from teams who design ML models and hardware in conversation, not isolation. It’s a feedback loop—each informs the other’s constraints and potentials." This observation from the Deep Tech Leaders podcast underscores the value of an integrated design approach for creating efficient edge AI systems.
- Apple’s on-device AI relies on the Neural Engine, first introduced in the A11 Bionic chip in 2017, which has seen a nearly 60-fold increase in processing power to 35 trillion operations per second in the A17 Pro chip. This tight integration allows for features like Face ID and Photos app categorization to run entirely on the device, enhancing user privacy. - To run Large Language Models (LLMs) that exceed a device's DRAM capacity, Apple developed a method of storing model parameters in flash memory and transferring them to DRAM as needed. This approach, combined with techniques like "windowing" to reuse neurons and "row-column bundling" for efficient data access, enables models twice the size of available DRAM to run with a 4-5x faster inference speed on CPUs and a 20-25x increase on GPUs compared to conventional methods. - The co-design approach extends to Apple's "Private Cloud Compute," which processes more complex AI tasks that cannot be run locally. This system sends only the necessary data to a secure cloud environment built on Apple silicon for processing, ensuring user data remains private and inaccessible even to Apple. - In manufacturing, co-design enables AI-driven supply chain optimization, with companies seeing up to a 25% improvement in delivery times and a 20% reduction in logistics costs. AI algorithms analyze data to optimize inventory levels, forecast demand, and improve logistics. - Siemens is leveraging co-design principles in manufacturing by integrating AI across its Siemens Xcelerator portfolio, partnering with companies like Microsoft and AWS. This allows for the creation of "Co-Pilots" that guide manufacturing engineering and planning, reducing the need for extensive expertise. - Looking forward, the field is exploring even greater specialization of AI accelerators for specific tasks like computer vision and the development of neuromorphic hardware that mimics the human brain for ultra-low power consumption. - Automated co-design methodologies are becoming more critical as the complexity of AI models and hardware grows. Researchers are using techniques like Bayesian optimization to navigate the vast design space of hardware and software, achieving an 18% to 40% improvement in the energy-delay product for models like ResNet and DQN over hand-tuned systems. - Key players in the edge AI hardware market include Apple, Qualcomm, NVIDIA, Intel, and Samsung. The market is consolidated, with the top five companies holding approximately 80-91% of the market share.