MLX Framework Enables Distributed AI on Apple Silicon
A recent talk showcased how Apple's MLX framework can be used for distributed inference and training across multiple Apple Silicon devices. The presentation, available on YouTube, is aimed at developers looking to optimize AI/ML workloads by leveraging the Neural Engine and other SoC features for more complex, on-device tasks.
Apple's machine learning research team released the open-source framework in December 2023. MLX was designed with APIs inspired by NumPy and PyTorch to provide a familiar feel for developers, aiming to simplify the creation and deployment of models on Apple hardware. The framework's key architectural advantage is its use of a unified memory model. Unlike systems that require copying data between separate CPU and GPU memory, MLX allows both processors to access arrays in a shared memory pool, eliminating data transfer bottlenecks. MLX features lazy computation, meaning arrays are only materialized when their results are needed, which helps optimize performance. The framework also supports dynamic graph construction, making it easier to debug and work with models that have variable input shapes. Beyond its core Python interface, MLX provides fully-featured APIs in C++, C, and Swift, mirroring the Python API's structure for cross-language development. This allows for deeper integration and performance tuning across Apple's software ecosystem. In one direct comparison, inference with the Phi-2 language model ran three times faster on MLX than on a PyTorch implementation using the same M1-Pro GPU. This performance gain was attributed to MLX driving the GPU to a much higher clock frequency during the task. The framework is distinct from CoreML, which is primarily for converting and optimizing pre-existing models. MLX is focused on the direct creation, training, and execution of machine learning models, fostering innovation specifically within the Apple silicon ecosystem. Developers are already using the framework to run a wide range of models locally, including large language models like LLaMA, image generators like Stable Diffusion, and speech recognition models like OpenAI's Whisper.