Linear-Complexity Self-Supervised Learning for Speech Processing

被引:0
|
作者
Zhang, Shucong [1 ]
Parcollet, Titouan [1 ]
van Dalen, Rogier [1 ]
Bhattacharya, Sourav [1 ]
机构
[1] Samsung AI Ctr Cambridge, Cambridge, England
来源
INTERSPEECH 2024 | 2024年
关键词
self-supervised learning; efficient models;
D O I
10.21437/Interspeech.2024-500
中图分类号
学科分类号
摘要
Self-supervised learning (SSL) models usually require weeks of pre-training with dozens of high-end GPUs. These models typically have a multi-headed self-attention (MHSA) context encoder. However, MHSA takes quadratic time and space in the input length, contributing to the high pre-training cost. Linear-complexity alternatives to MHSA have been proposed. For instance, in supervised training, the SummaryMixing model is the first to outperform MHSA across multiple speech processing tasks. However, these cheaper alternatives have not been explored for SSL yet. This paper studies a linear-complexity context encoder for SSL for the first time. With better or equivalent performance for the downstream tasks of the MP3S benchmark, SummaryMixing reduces the pre-training time and peak VRAM of wav2vec 2.0 model by 18% and by 23%, respectively, leading to the pre-training of a 155M wav2vec 2.0 model finished within one week with 4 Tesla A100 GPUs. Code(1) is available.
引用
收藏
页码:3480 / 3484
页数:5
相关论文
共 50 条
  • [21] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
    Ravanelli, Mirco
    Zhong, Jianyuan
    Pascual, Santiago
    Swietojanski, Pawel
    Monteiro, Joao
    Trmal, Jan
    Bengio, Yoshua
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993
  • [22] Gated Self-supervised Learning for Improving Supervised Learning
    Fuadi, Erland Hillman
    Ruslim, Aristo Renaldo
    Wardhana, Putu Wahyu Kusuma
    Yudistira, Novanto
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 611 - 615
  • [23] Self-Supervised Learning for Recommendation
    Huang, Chao
    Xia, Lianghao
    Wang, Xiang
    He, Xiangnan
    Yin, Dawei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 5136 - 5139
  • [24] Longitudinal self-supervised learning
    Zhao, Qingyu
    Liu, Zixuan
    Adeli, Ehsan
    Pohl, Kilian M.
    MEDICAL IMAGE ANALYSIS, 2021, 71
  • [25] Quantum self-supervised learning
    Jaderberg, B.
    Anderson, L. W.
    Xie, W.
    Albanie, S.
    Kiffner, M.
    Jaksch, D.
    QUANTUM SCIENCE AND TECHNOLOGY, 2022, 7 (03):
  • [26] Boosting Self-Supervised Embeddings for Speech Enhancement
    Hung, Kuo-Hsuan
    Fu, Szu-Wei
    Tseng, Huan-Hsin
    Chiang, Hsin-Tien
    Tsao, Yu
    Lin, Chii-Wann
    INTERSPEECH 2022, 2022, : 186 - 190
  • [27] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
    Chang, Xuankai
    Maekaku, Takashi
    Fujita, Yuya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3819 - 3823
  • [28] Scaling Effect of Self-Supervised Speech Models
    Pu, Jie
    Yang, Yuguang
    Li, Ruirui
    Elibol, Oguz
    Droppo, Jasha
    INTERSPEECH 2021, 2021, : 1084 - 1088
  • [29] SIMILARITY ANALYSIS OF SELF-SUPERVISED SPEECH REPRESENTATIONS
    Chung, Yu-An
    Belinkov, Yonatan
    Glass, James
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3040 - 3044
  • [30] ON COMPRESSING SEQUENCES FOR SELF-SUPERVISED SPEECH MODELS
    Meng, Yen
    Chen, Hsuan-Jui
    Shi, Jiatong
    Watanabe, Shinji
    Garcia, Paola
    Lee, Hung-yi
    Tang, Hao
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1128 - 1135