Spotify Releases Lightweight 'Basic Pitch' ML Model
Spotify Engineering has open-sourced 'Basic Pitch,' an audio-to-MIDI converter built on a lightweight machine learning model. The shallow architecture is designed for speed and versatility, enabling real-time conversion with a minimal resource footprint suitable for consumer-facing music applications.
- The model’s architecture builds on Google's CREPE (Convolutional Representation for Pitch Estimation), a convolutional neural network that analyzes the raw audio waveform, and adds custom onset and offset detectors to define the start and end times of individual notes. - Basic Pitch was presented at the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). The model is instrument-agnostic, polyphonic, and notably detects pitch bends, an expressive detail often lost in other audio-to-MIDI converters. - Its lightweight design, with fewer than 17,000 parameters and under 20 MB of peak memory usage, was a direct response to the trend of massive, computationally expensive ML models. This efficiency is achieved through techniques like using a harmonic constant-Q transform for input and employing a shallow architecture with fewer layers. - The project is open-sourced under an Apache 2.0 license, allowing for commercial use, and is available as both a Python library and an npm package. - For deployment versatility, the model is available in multiple formats, including TensorFlow, TensorFlowLite, CoreML for Apple devices, and ONNX for cross-platform interoperability, reflecting a production-oriented MLOps approach. - The tool was developed by Spotify's Audio Intelligence Lab in partnership with Soundtrap, the company's browser-based online digital audio workstation (DAW).