Spotify Releases Lightweight 'Basic Pitch' ML Model
What happened
Spotify Engineering has open-sourced 'Basic Pitch,' an audio-to-MIDI converter built on a lightweight machine learning model. The shallow architecture is designed for speed and versatility, enabling real-time conversion with a minimal resource footprint suitable for consumer-facing music applications.
Why it matters
- The model’s architecture builds on Google's CREPE (Convolutional Representation for Pitch Estimation), a convolutional neural network that analyzes the raw audio waveform, and adds custom onset and offset detectors to define the start and end times of individual notes. - Basic Pitch was presented at the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). The model is instrument-agnostic, polyphonic, and notably detects pitch bends, an expressive detail often lost in other audio-to-MIDI converters. - Its lightweight design, with fewer than 17,000 parameters and under 20 MB of peak memory usage, was a direct response to the trend of massive, computationally expensive ML models. This efficiency is achieved through techniques like using a harmonic constant-Q transform for input and employing a shallow architecture with fewer layers. - The project is open-sourced under an Apache 2.0 license, allowing for commercial use, and is available as both a Python library and an npm package. - For deployment versatility, the model is available in multiple formats, including TensorFlow, TensorFlowLite, CoreML for Apple devices, and ONNX for cross-platform interoperability, reflecting a production-oriented MLOps approach. - The tool was developed by Spotify's Audio Intelligence Lab in partnership with Soundtrap, the company's browser-based online digital audio workstation (DAW).
Key numbers
- Basic Pitch was presented at the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
- Its lightweight design, with fewer than 17,000 parameters and under 20 MB of peak memory usage, was a direct response to the trend of massive, computationally expensive ML models.
- The project is open-sourced under an Apache 2.0 license, allowing for commercial use, and is available as both a Python library and an npm package.
Quick answers
What happened in Spotify Releases Lightweight 'Basic Pitch' ML Model?
Spotify Engineering has open-sourced 'Basic Pitch,' an audio-to-MIDI converter built on a lightweight machine learning model. The shallow architecture is designed for speed and versatility, enabling real-time conversion with a minimal resource footprint suitable for consumer-facing music applications.
Why does Spotify Releases Lightweight 'Basic Pitch' ML Model matter?
The model’s architecture builds on Google's CREPE (Convolutional Representation for Pitch Estimation), a convolutional neural network that analyzes the raw audio waveform, and adds custom onset and offset detectors to define the start and end times of individual notes. Basic Pitch was presented at the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). The model is instrument-agnostic, polyphonic, and notably detects pitch bends, an expressive detail often lost in other audio-to-MIDI converters. Its lightweight design, with fewer than 17,000 parameters and under 20 MB of peak memory usage, was a direct response to the trend of massive, computationally expensive ML models. This efficiency is achieved through techniques like using a harmonic constant-Q transform for input and employing a shallow architecture with fewer layers. The project is open-sourced under an Apache 2.0 license, allowing for commercial use, and is available as both a Python library and an npm package. For deployment versatility, the model is available in multiple formats, including TensorFlow, TensorFlowLite, CoreML for Apple devices, and ONNX for cross-platform interoperability, reflecting a production-oriented MLOps approach. The tool was developed by Spotify's Audio Intelligence Lab in partnership with Soundtrap, the company's browser-based online digital audio workstation (DAW).