Neural Compression Augmentation for Contrastive Audio Representation Learning

被引：0

作者：

Wang, Zhaoyu ^{[1
]}

Liu, Haohe ^{[2
]}

Coppock, Harry ^{[1
]}

Schuller, Bjorn ^{[1
]}

Plumbley, Mark D. ^{[2
]}

机构：

[1] Imperial Coll London, Dept Comp, GLAM, London, England

[2] Univ Surrey, Ctr Vis Speech & Signal Proc CVSSP, Guildford, Surrey, England

来源：

INTERSPEECH 2024 | 2024年

基金：

英国工程与自然科学研究理事会;

关键词：

audio compression; data augmentation; self-supervised learning;

D O I：

10.21437/Interspeech.2024-1156

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The choice of data augmentation is pivotal in contrastive self-supervised learning. Current augmentation techniques for audio data, such as the widely used Random Resize Crop (RRC), underperform in pitch-sensitive music tasks and lack generalisation across various types of audio. This study aims to address these limitations by introducing Neural Compression Augmentation (NCA), an approach based on lossy neural compression. We use the Audio Barlow Twins (ABT), a contrastive self-supervised framework for audio, as our backbone. We experiment with both NCA and several baseline augmentation methods in the augmentation block of ABT and train the models on AudioSet(1). Experimental results show that models integrated with NCA considerably surpass the original performance of ABT, especially in the music tasks of the HEAR(2) benchmark, demonstrating the effectiveness of compression-based augmentation for audio contrastive self-supervised learning.

引用

页码：3335 / 3339

页数：5

共 23 条

[1]

Al-Tahan H., 2021, INT C ARTIFICIAL INT

[2]

Anton J., 2023, IEEE INT C, P1

[3]

Baevski A, 2020, ADV NEUR IN, V33

[4]

CAILLON Antoine, 2021, arXiv

[5] ENHANCING INTO THE CODEC: NOISE ROBUST SPEECH CODING WITH VECTOR-QUANTIZED AUTOENCODERS [J].

Casebeer, Jonah ;

Vale, Vinjai ;

Isik, Umut ;

Valin, Jean-Marc ;

Giri, Ritwik ;

Krishnaswamy, Arvindh .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :711-715

[6]

Chen T, 2020, PR MACH LEARN RES, V119

[7]

Chen X., 2021, P IEEECVF C COMPUTER

[8]

Defossez A., 2023, Transactions on Machine Learning Research

[9] Self-Supervised Representation Learning: Introduction, advances, and challenges [J].

Ericsson, Linus ;

Gouk, Henry ;

Loy, Chen Change ;

Hospedales, Timothy M. .

IEEE SIGNAL PROCESSING MAGAZINE, 2022, 39 (03) :42-62

[10]

Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261

← 1 2 3 →