Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

被引：1

作者：

Zaiem, Salah ^{[1
]}

Kemiche, Youcef ^{[2
,3
]}

Parcollet, Titouan ^{[4
,5
]}

Essid, Slim ^{[1
]}

Ravanelli, Mirco ^{[6
]}

机构：

[1] Inst Polytech Paris, Telecom Paris, LTCI, Paris, France

[2] Hi PARIS Engn Team, Paris, France

[3] Capgemini, Paris, France

[4] Samsung AI Ctr, Cambridge, England

[5] Univ Cambridge, Cambridge, England

[6] Concordia Univ, Univ Montreal, Mila Quebec AI Inst, Montreal, PQ, Canada

来源：

INTERSPEECH 2023 | 2023年

关键词：

self-supervised learning; representation learning;

D O I：

10.21437/Interspeech.2023-1087

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, and while the number of considered tasks has been growing, most rely upon a single decoding architecture that maps the frozen SSL representations to the downstream labels. This work investigates the robustness of such benchmarking results to changes in the decoder architecture. Interestingly, it appears that varying the architecture of the downstream decoder leads to significant variations in the leaderboards of most tasks. Concerningly, our study reveals that benchmarking using limited decoders may cause a counterproductive increase in the sizes of the developed SSL models.

引用

页码：2873 / 2877

页数：5

共 50 条

[1] Self-Supervised Speech Representation Learning: A Review
Mohamed, Abdelrahman
Lee, Hung-yi
Borgholt, Lasse
Havtorn, Jakob D.
Edin, Joakim
Igel, Christian
Kirchhoff, Katrin
Li, Shang-Wen
Livescu, Karen
Maaloe, Lars
Sainath, Tara N.
Watanabe, Shinji
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
[2] Self-Supervised Learning With Segmental Masking for Speech Representation
Yue, Xianghu
Lin, Jingru
Gutierrez, Fabian Ritter
Li, Haizhou
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
[3] Phonetically Motivated Self-Supervised Speech Representation Learning
Yue, Xianghu
Li, Haizhou
INTERSPEECH 2021, 2021, : 746 - 750
[4] Speech self-supervised representations benchmarking: A case for larger probing heads
Zaiem, Salah
Kemiche, Youcef
Parcollet, Titouan
Essid, Slim
Ravanelli, Mirco
COMPUTER SPEECH AND LANGUAGE, 2025, 89
[5] Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Wu, Yihan
Wang, Xi
Zhang, Shaofei
He, Lei
Song, Ruihua
Nie, Jian-Yun
INTERSPEECH 2022, 2022, : 5503 - 5507
[6] CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Meng, Chutong
Ao, Junyi
Ko, Tom
Wang, Mingxuan
Li, Haizhou
INTERSPEECH 2023, 2023, : 2978 - 2982
[7] Wav2vec-C: A Self-supervised Model for Speech Representation Learning
Sadhu, Samik
He, Di
Huang, Che-Wei
Mallidi, Sri Harish
Wu, Minhua
Rastrow, Ariya
Stolcke, Andreas
Droppo, Jasha
Maas, Roland
INTERSPEECH 2021, 2021, : 711 - 715
[8] Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
Luo, Jian
Wang, Jianzong
Cheng, Ning
Xiao, Jing
INTERSPEECH 2021, 2021, : 1169 - 1173
[9] EXPLORING THE INTEGRATION OF SPEECH SEPARATION AND RECOGNITION WITH SELF-SUPERVISED LEARNING REPRESENTATION
Masuyama, Yoshiki
Chang, Xuankai
Zhang, Wangyou
Cornell, Samuele
Wang, Zhong-Qiu
Ono, Nobutaka
Qian, Yanmin
Watanabe, Shinji
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
[10] Self-supervised Representation Fusion for Speech and Wearable Based Emotion Recognition
Dissanayake, Vipula
Seneviratne, Sachith
Suriyaarachchi, Hussel
Wen, Elliott
Nanayakkara, Suranga
INTERSPEECH 2022, 2022, : 3598 - 3602

← 1 2 3 4 5 →