SUPERB @ SLT 2022: CHALLENGE ON GENERALIZATION AND EFFICIENCY OF SELF-SUPERVISED SPEECH REPRESENTATION LEARNING

被引:9
|
作者
Feng, Tzu-Hsun [1 ]
Dong, Annie [2 ]
Yeh, Ching-Feng [2 ]
Yang, Shu-Wen [1 ]
Lin, Tzu-Quan [1 ]
Shi, Jiatong
Chang, Kai-Wei [1 ]
Huang, Zili [4 ]
Wu, Haibin [1 ]
Chang, Xuankai [3 ]
Watanabe, Shinji [3 ]
Mohamed, Abdelrahman [2 ]
Li, Shang-Wen [2 ]
Lee, Hung-Yi [1 ]
机构
[1] Natl Taiwan Univ, Taipei City, Taiwan
[2] Meta, Menlo Pk, CA USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[4] Johns Hopkins Univ, Baltimore, MD 21218 USA
来源
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年
关键词
Self-supervised Learning; Pre-training; Network Compression;
D O I
10.1109/SLT54892.2023.10022770
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency. The challenge builds upon the SUPERB benchmark and implements metrics to measure the computation requirements of self-supervised learning (SSL) representation and to evaluate its generalizability and performance across the diverse SUPERB tasks. The SUPERB benchmark provides comprehensive coverage of popular speech processing tasks, from speech and speaker recognition to audio generation and semantic understanding. As SSL has gained interest in the speech community and showed promising outcomes, we envision the challenge to uplevel the impact of SSL techniques by motivating more practical designs of techniques beyond task performance. We summarize the results of 14 submitted models in this paper. We also discuss the main findings from those submissions and the future directions of SSL research.
引用
收藏
页码:1096 / 1103
页数:8
相关论文
共 50 条
  • [1] Phonetically Motivated Self-Supervised Speech Representation Learning
    Yue, Xianghu
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 746 - 750
  • [2] Self-Supervised Speech Representation Learning: A Review
    Mohamed, Abdelrahman
    Lee, Hung-yi
    Borgholt, Lasse
    Havtorn, Jakob D.
    Edin, Joakim
    Igel, Christian
    Kirchhoff, Katrin
    Li, Shang-Wen
    Livescu, Karen
    Maaloe, Lars
    Sainath, Tara N.
    Watanabe, Shinji
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
  • [3] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
  • [4] CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
    Meng, Chutong
    Ao, Junyi
    Ko, Tom
    Wang, Mingxuan
    Li, Haizhou
    INTERSPEECH 2023, 2023, : 2978 - 2982
  • [5] TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
    Liu, Andy T.
    Li, Shang-Wen
    Lee, Hung-yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2351 - 2366
  • [6] Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
    Luo, Jian
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    INTERSPEECH 2021, 2021, : 1169 - 1173
  • [7] EXPLORING THE INTEGRATION OF SPEECH SEPARATION AND RECOGNITION WITH SELF-SUPERVISED LEARNING REPRESENTATION
    Masuyama, Yoshiki
    Chang, Xuankai
    Zhang, Wangyou
    Cornell, Samuele
    Wang, Zhong-Qiu
    Ono, Nobutaka
    Qian, Yanmin
    Watanabe, Shinji
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [8] On the Generalization and Causal Explanation in Self-Supervised Learning
    Qiang, Wenwen
    Song, Zeen
    Gu, Ziyin
    Li, Jiangmeng
    Zheng, Changwen
    Sun, Fuchun
    Xiong, Hui
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1727 - 1754
  • [9] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
    Hsu, Wei-Ning
    Bolte, Benjamin
    Tsai, Yao-Hung Hubert
    Lakhotia, Kushal
    Salakhutdinov, Ruslan
    Mohamed, Abdelrahman
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3451 - 3460
  • [10] Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction
    Brima, Yusuf
    Krumnack, Ulf
    Pika, Simone
    Heidemann, Gunther
    INFORMATION, 2024, 15 (02)