Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

被引:1
|
作者
Zaiem, Salah [1 ]
Kemiche, Youcef [2 ,3 ]
Parcollet, Titouan [4 ,5 ]
Essid, Slim [1 ]
Ravanelli, Mirco [6 ]
机构
[1] Inst Polytech Paris, Telecom Paris, LTCI, Paris, France
[2] Hi PARIS Engn Team, Paris, France
[3] Capgemini, Paris, France
[4] Samsung AI Ctr, Cambridge, England
[5] Univ Cambridge, Cambridge, England
[6] Concordia Univ, Univ Montreal, Mila Quebec AI Inst, Montreal, PQ, Canada
来源
INTERSPEECH 2023 | 2023年
关键词
self-supervised learning; representation learning;
D O I
10.21437/Interspeech.2023-1087
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, and while the number of considered tasks has been growing, most rely upon a single decoding architecture that maps the frozen SSL representations to the downstream labels. This work investigates the robustness of such benchmarking results to changes in the decoder architecture. Interestingly, it appears that varying the architecture of the downstream decoder leads to significant variations in the leaderboards of most tasks. Concerningly, our study reveals that benchmarking using limited decoders may cause a counterproductive increase in the sizes of the developed SSL models.
引用
收藏
页码:2873 / 2877
页数:5
相关论文
共 50 条
  • [1] Self-Supervised Speech Representation Learning: A Review
    Mohamed, Abdelrahman
    Lee, Hung-yi
    Borgholt, Lasse
    Havtorn, Jakob D.
    Edin, Joakim
    Igel, Christian
    Kirchhoff, Katrin
    Li, Shang-Wen
    Livescu, Karen
    Maaloe, Lars
    Sainath, Tara N.
    Watanabe, Shinji
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
  • [2] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
  • [3] Phonetically Motivated Self-Supervised Speech Representation Learning
    Yue, Xianghu
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 746 - 750
  • [4] Speech self-supervised representations benchmarking: A case for larger probing heads
    Zaiem, Salah
    Kemiche, Youcef
    Parcollet, Titouan
    Essid, Slim
    Ravanelli, Mirco
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [5] Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
    Wu, Yihan
    Wang, Xi
    Zhang, Shaofei
    He, Lei
    Song, Ruihua
    Nie, Jian-Yun
    INTERSPEECH 2022, 2022, : 5503 - 5507
  • [6] CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
    Meng, Chutong
    Ao, Junyi
    Ko, Tom
    Wang, Mingxuan
    Li, Haizhou
    INTERSPEECH 2023, 2023, : 2978 - 2982
  • [7] Wav2vec-C: A Self-supervised Model for Speech Representation Learning
    Sadhu, Samik
    He, Di
    Huang, Che-Wei
    Mallidi, Sri Harish
    Wu, Minhua
    Rastrow, Ariya
    Stolcke, Andreas
    Droppo, Jasha
    Maas, Roland
    INTERSPEECH 2021, 2021, : 711 - 715
  • [8] Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
    Luo, Jian
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    INTERSPEECH 2021, 2021, : 1169 - 1173
  • [9] EXPLORING THE INTEGRATION OF SPEECH SEPARATION AND RECOGNITION WITH SELF-SUPERVISED LEARNING REPRESENTATION
    Masuyama, Yoshiki
    Chang, Xuankai
    Zhang, Wangyou
    Cornell, Samuele
    Wang, Zhong-Qiu
    Ono, Nobutaka
    Qian, Yanmin
    Watanabe, Shinji
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [10] Self-supervised Representation Fusion for Speech and Wearable Based Emotion Recognition
    Dissanayake, Vipula
    Seneviratne, Sachith
    Suriyaarachchi, Hussel
    Wen, Elliott
    Nanayakkara, Suranga
    INTERSPEECH 2022, 2022, : 3598 - 3602