OpenAmp

Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models

Alec Wright¹, Alistair Carson¹, and Lauri Juvela²

¹Acoustics and Audio Group, University of Edinburgh, Edinburgh, UK
² Department of Information and Communications Engineering (DICE), Aalto University, Espoo, Finland

🗞️ Paper </> Code 🔊 Audio examples

Abstract

This paper introduces Open-Amp, a synthetic data framework for generating large-scale and diverse audio effects data. Audio effects are relevant to many musical audio processing and Music Information Retrieval (MIR) tasks, such as modelling of analog audio effects, automatic mixing, tone matching and transcription. Existing audio effects datasets are limited in scope, usually including relatively few audio effects processors and a limited amount of input audio signals. Our proposed framework overcomes these issues, by crowdsourcing neural network emulations of guitar amplifiers and effects, created by users of open-source audio effects emulation software. This allows users of Open-Amp complete control over the input signals to be processed by the effects models, as well as providing high-quality emulations of hundreds of devices. Open-Amp can render audio online during training, allowing great flexibility in data augmentation. Our experiments show that using Open-Amp to train a guitar effects encoder achieves new state-of-the-art results on multiple guitar effects classification tasks. Furthermore, we train a one-to-many guitar effects model using Open-Amp, and use it to emulate unseen analog effects via manipulation of it’s learned latent space, indicating transferability to analog guitar effects data.

One-to-many audio effect emulation

In the paper we proposed using the OpenAmp dataset to train a one-to-many guitar effect emulation model. We used a temporal convolutional network (TCN) architecture and conditioned it on a learnable look-up table of embeddings. The model learns one embedding vector per (virtual) amp or pedal in the OpenAmp dataset, so at inference we can render a specific guitar tone by selecting the relevant embedding from the look-up table and processing audio through the TCN. This also allows us to enrol unseen analog devices into the pre-trained one-to-many model by learning a new embedding within the existing space. In this demo we show audio examples of:

OpenAmp devices seen during training
Unseen analog devices

1. OpenAmp devices seen during training

The tables below contain audio examples from 5 virtual devices within the OpenAmp dataset (seen during training):

EffectrodeBlackbirdClean (PedalPack4)
ZenDrive_BlackMagic_DriveKnob (PedalPack2)
Bogner_EcstasyBlue_GainKnob_cond-0.25 (PedalPack2)
Colombo_Plexi_Knob_cond-0.50 (PedalPack2)
SoundCity50_ThroneTorcher_DIRECT (AmpPack2)

For each device we can compare rendered audio output outputs from the proposed one-to-many models (Emb-16, Emb-64 and Emb-64), a baseline 1-to-1 TCN model, and the target tone rendered from the OpenAmp set. NB the 5 devices were selected (and ordered below) based on the minimum, 25th percentile, median, 75th percentile and maximum test loss of the Emb-64 model to show the spread of modeling capabilities. The test losses are shown in Table III in the paper.

Clean input signals (unseen during training) used to render the examples:

Clip 1	Clip 2	Clip 3	Clip 4

1.1 EffectrodeBlackbirdClean (PedalPack4)

Model	Clip 1	Clip 2	Clip 3	Clip 4
TARGET
Baseline 1-to-1
Emb 16
Emb 64
Emb 256

1.2 ZenDrive_BlackMagic_DriveKnob (PedalPack2)

Model	Clip 1	Clip 2	Clip 3	Clip 4
TARGET
Baseline 1-to-1
Emb 16
Emb 64
Emb 256

1.3 Bogner_EcstasyBlue_GainKnob_cond-0.25 (PedalPack2)

Model	Clip 1	Clip 2	Clip 3	Clip 4
TARGET
Baseline 1-to-1
Emb 16
Emb 64
Emb 256

1.4 Colombo_Plexi_Knob_cond-0.50 (PedalPack2)

Model	Clip 1	Clip 2	Clip 3	Clip 4
TARGET
Baseline 1-to-1
Emb 16
Emb 64
Emb 256

1.5 SoundCity50_ThroneTorcher_DIRECT (AmpPack2)

Model	Clip 1	Clip 2	Clip 3	Clip 4
TARGET
Baseline 1-to-1
Emb 16
Emb 64
Emb 256

2. EGFxSet devices – enrolled into pre-trained model using unseen data

In Section 2 we explore the task of enrolling unseen analog effects into the learned embedding space of the foundation one-to-many model. Here we used data from the EGFxSet dataset. This consists of ~1 hour of single guitar notes processed through various analog effects devices. In our experiments we froze all parameters in the pre-trained foundation model and learned a new embedding for three analog effects pedals (unseen during training):

Ibanez TubeScreamer Mini
Proco RAT
Boss Blues Driver BD-2

TSMini — Image source: https://www.andertons.co.uk/ibanez-tube-screamer-mini-overdrive-pedal/

Image source: https://www.gear4music.com/Guitar-and-Bass/Pro-Co-RAT-2-Distortion/CDA

BD2 — Image source: https://www.andertons.co.uk/boss-bd-2-blues-driver-pedal

Input signals (unseen during training) used to render the examples^*:

Note 1				Note 2				Note 3				Note 4

^*NB these are different to that of Section 1 to allow comparison with the target tone in the EGFxSet.

2.1 Ibanez TubeScreamer Mini

	Note 1				Note 2				Note 3				Note 4
	Train data length				Train data length				Train data length				Train data length
Model	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)
TARGET
Baseline 1-to-1
Emb 16
Emb 64
Emb 256

2.2 Proco RAT

	Note 1				Note 2				Note 3				Note 4
	Train data length				Train data length				Train data length				Train data length
Model	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)
TARGET
Baseline 1-to-1
Emb 16
Emb 64
Emb 256

2.3 Boss Blues Driver BD-2

	Note 1				Note 2				Note 3				Note 4
	Train data length				Train data length				Train data length				Train data length
Model	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)	~50 min. (100%)	~5 min. (10%)	~30 s (1%)	~3 s (0.1%)
TARGET
Baseline 1-to-1
Emb 16
Emb 64
Emb 256