Spotify Releases Lightweight 'Basic Pitch' ML Model

Spotify Engineering has open-sourced 'Basic Pitch,' an audio-to-MIDI converter built on a lightweight machine learning model. The shallow architecture is designed for speed and versatility, enabling real-time conversion with a minimal resource footprint suitable for consumer-facing music applications.

- The model’s architecture builds on Google's CREPE (Convolutional Representation for Pitch Estimation), a convolutional neural network that analyzes the raw audio waveform, and adds custom onset and offset detectors to define the start and end times of individual notes. - Basic Pitch was presented at the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). The model is instrument-agnostic, polyphonic, and notably detects pitch bends, an expressive detail often lost in other audio-to-MIDI converters. - Its lightweight design, with fewer than 17,000 parameters and under 20 MB of peak memory usage, was a direct response to the trend of massive, computationally expensive ML models. This efficiency is achieved through techniques like using a harmonic constant-Q transform for input and employing a shallow architecture with fewer layers. - The project is open-sourced under an Apache 2.0 license, allowing for commercial use, and is available as both a Python library and an npm package. - For deployment versatility, the model is available in multiple formats, including TensorFlow, TensorFlowLite, CoreML for Apple devices, and ONNX for cross-platform interoperability, reflecting a production-oriented MLOps approach. - The tool was developed by Spotify's Audio Intelligence Lab in partnership with Soundtrap, the company's browser-based online digital audio workstation (DAW).

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.