Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

被引：1

作者：

Zaiem, Salah ^{[1
]}

Kemiche, Youcef ^{[2
,3
]}

Parcollet, Titouan ^{[4
,5
]}

Essid, Slim ^{[1
]}

Ravanelli, Mirco ^{[6
]}

机构：

[1] Inst Polytech Paris, Telecom Paris, LTCI, Paris, France

[2] Hi PARIS Engn Team, Paris, France

[3] Capgemini, Paris, France

[4] Samsung AI Ctr, Cambridge, England

[5] Univ Cambridge, Cambridge, England

[6] Concordia Univ, Univ Montreal, Mila Quebec AI Inst, Montreal, PQ, Canada

来源：

INTERSPEECH 2023 | 2023年

关键词：

self-supervised learning; representation learning;

D O I：

10.21437/Interspeech.2023-1087

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, and while the number of considered tasks has been growing, most rely upon a single decoding architecture that maps the frozen SSL representations to the downstream labels. This work investigates the robustness of such benchmarking results to changes in the decoder architecture. Interestingly, it appears that varying the architecture of the downstream decoder leads to significant variations in the leaderboards of most tasks. Concerningly, our study reveals that benchmarking using limited decoders may cause a counterproductive increase in the sizes of the developed SSL models.

引用

页码：2873 / 2877

页数：5

共 50 条

[31] MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Ma, Ziyang
Zheng, Zhisheng
Tang, Changli
Wang, Yujin
Chen, Xie
INTERSPEECH 2023, 2023, : 82 - 86
[32] Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning
Zaiem, Salah
Parcollet, Titouan
Essid, Slim
INTERSPEECH 2022, 2022, : 669 - 673
[33] LARGE-SCALE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEAKER VERIFICATION
Chen, Zhengyang
Chen, Sanyuan
Wu, Yu
Qian, Yao
Wang, Chengyi
Liu, Shujie
Qian, Yanmin
Zeng, Michael
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6147 - 6151
[34] SUPERB @ SLT 2022: CHALLENGE ON GENERALIZATION AND EFFICIENCY OF SELF-SUPERVISED SPEECH REPRESENTATION LEARNING
Feng, Tzu-Hsun
Dong, Annie
Yeh, Ching-Feng
Yang, Shu-Wen
Lin, Tzu-Quan
Shi, Jiatong
Chang, Kai-Wei
Huang, Zili
Wu, Haibin
Chang, Xuankai
Watanabe, Shinji
Mohamed, Abdelrahman
Li, Shang-Wen
Lee, Hung-Yi
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1096 - 1103
[35] Spectral Salt-and-Pepper Patch Masking for Self-Supervised Speech Representation Learning
Kim, June-Woo
Chung, Hoon
Jung, Ho-Young
MATHEMATICS, 2023, 11 (15)
[36] UNIVERSAL PARALINGUISTIC SPEECH REPRESENTATIONS USING SELF-SUPERVISED CONFORMERS
Shor, Joel
Jansen, Aren
Han, Wei
Park, Daniel
Zhang, Yu
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3169 - 3173
[37] Robust Self-Supervised Audio-Visual Speech Recognition
Shi, Bowen
Hsu, Wei-Ning
Mohamed, Abdelrahman
INTERSPEECH 2022, 2022, : 2118 - 2122
[38] INJECTING TEXT IN SELF-SUPERVISED SPEECH PRETRAINING
Chen, Zhehuai
Zhang, Yu
Rosenberg, Andrew
Ramabhadran, Bhuvana
Wang, Gary
Moreno, Pedro
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 251 - 258
[39] Self-supervised audiovisual representation learning for remote sensing data
Heidler, Konrad
Mou, Lichao
Hu, Di
Jin, Pu
Li, Guangyao
Gan, Chuang
Wen, Ji-Rong
Zhu, Xiao Xiang
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 116
[40] Contrastive Self-supervised Representation Learning Using Synthetic Data
She, Dong-Yu
Xu, Kun
INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2021, 18 (04) : 556 - 567

← 1 2 3 4 5 →