Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

被引:1
|
作者
Zaiem, Salah [1 ]
Kemiche, Youcef [2 ,3 ]
Parcollet, Titouan [4 ,5 ]
Essid, Slim [1 ]
Ravanelli, Mirco [6 ]
机构
[1] Inst Polytech Paris, Telecom Paris, LTCI, Paris, France
[2] Hi PARIS Engn Team, Paris, France
[3] Capgemini, Paris, France
[4] Samsung AI Ctr, Cambridge, England
[5] Univ Cambridge, Cambridge, England
[6] Concordia Univ, Univ Montreal, Mila Quebec AI Inst, Montreal, PQ, Canada
来源
INTERSPEECH 2023 | 2023年
关键词
self-supervised learning; representation learning;
D O I
10.21437/Interspeech.2023-1087
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, and while the number of considered tasks has been growing, most rely upon a single decoding architecture that maps the frozen SSL representations to the downstream labels. This work investigates the robustness of such benchmarking results to changes in the decoder architecture. Interestingly, it appears that varying the architecture of the downstream decoder leads to significant variations in the leaderboards of most tasks. Concerningly, our study reveals that benchmarking using limited decoders may cause a counterproductive increase in the sizes of the developed SSL models.
引用
收藏
页码:2873 / 2877
页数:5
相关论文
共 50 条
  • [21] The Efficacy of Self-Supervised Speech Models as Audio Representations
    Wu, Tung-Yu
    Hsu, Tsu-Yuan
    Li, Chen-An
    Lin, Tzu-Han
    Lee, Hung-yi
    HEAR: HOLISTIC EVALUATION OF AUDIO REPRESENTATIONS, VOL 166, 2021, 166 : 90 - 110
  • [22] A survey on self-supervised methods for visual representation learning
    Uelwer, Tobias
    Robine, Jan
    Wagner, Stefan Sylvius
    Hoeftmann, Marc
    Upschulte, Eric
    Konietzny, Sebastian
    Behrendt, Maike
    Harmeling, Stefan
    MACHINE LEARNING, 2025, 114 (04)
  • [23] Video Face Clustering with Self-Supervised Representation Learning
    Sharma V.
    Tapaswi M.
    Saquib Sarfraz M.
    Stiefelhagen R.
    IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 145 - 157
  • [24] Random Field Augmentations for Self-Supervised Representation Learning
    Mansfield, Philip Andrew
    Afkanpour, Arash
    Morningstar, Warren Richard
    Singhal, Karan
    NEURIPS WORKSHOP ON SYMMETRY AND GEOMETRY IN NEURAL REPRESENTATIONS, 2023, 228 : 292 - 302
  • [25] SELF-SUPERVISED REPRESENTATION LEARNING FROM ELECTROENCEPHALOGRAPHY SIGNALS
    Banville, Hubert
    Albuquerque, Isabela
    Hyvarinen, Aapo
    Moffat, Graeme
    Engemann, Denis-Alexander
    Gramfort, Alexandre
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [26] Functional Knowledge Transfer with Self-supervised Representation Learning
    Chhipa, Prakash Chandra
    Chopra, Muskaan
    Mengi, Gopal
    Gupta, Varun
    Upadhyay, Richa
    Chippa, Meenakshi Subhash
    De, Kanjar
    Saini, Rajkumar
    Uchida, Seiichi
    Liwicki, Marcus
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3339 - 3343
  • [27] Simple Self-supervised Multiplex Graph Representation Learning
    Mo, Yujie
    Chen, Yuhuan
    Peng, Liang
    Shi, Xiaoshuang
    Zhu, Xiaofeng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3301 - 3309
  • [28] Self-Supervised Representation Learning for Document Image Classification
    Siddiqui, Shoaib Ahmed
    Dengel, Andreas
    Ahmed, Sheraz
    IEEE ACCESS, 2021, 9 : 164358 - 164367
  • [29] Self-supervised Representation Learning Using 360° Data
    Li, Junnan
    Liu, Jianquan
    Wong, Yongkang
    Nishimura, Shoji
    Kankanhalli, Mohan S.
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 998 - 1006
  • [30] Self-supervised representation learning for surgical activity recognition
    Daniel Paysan
    Luis Haug
    Michael Bajka
    Markus Oelhafen
    Joachim M. Buhmann
    International Journal of Computer Assisted Radiology and Surgery, 2021, 16 : 2037 - 2044