Linear-Complexity Self-Supervised Learning for Speech Processing

被引：0

作者：

Zhang, Shucong ^{[1
]}

Parcollet, Titouan ^{[1
]}

van Dalen, Rogier ^{[1
]}

Bhattacharya, Sourav ^{[1
]}

机构：

[1] Samsung AI Ctr Cambridge, Cambridge, England

来源：

INTERSPEECH 2024 | 2024年

关键词：

self-supervised learning; efficient models;

D O I：

10.21437/Interspeech.2024-500

中图分类号：

学科分类号：

摘要：

Self-supervised learning (SSL) models usually require weeks of pre-training with dozens of high-end GPUs. These models typically have a multi-headed self-attention (MHSA) context encoder. However, MHSA takes quadratic time and space in the input length, contributing to the high pre-training cost. Linear-complexity alternatives to MHSA have been proposed. For instance, in supervised training, the SummaryMixing model is the first to outperform MHSA across multiple speech processing tasks. However, these cheaper alternatives have not been explored for SSL yet. This paper studies a linear-complexity context encoder for SSL for the first time. With better or equivalent performance for the downstream tasks of the MP3S benchmark, SummaryMixing reduces the pre-training time and peak VRAM of wav2vec 2.0 model by 18% and by 23%, respectively, leading to the pre-training of a 155M wav2vec 2.0 model finished within one week with 4 Tesla A100 GPUs. Code(1) is available.

引用

页码：3480 / 3484

页数：5

共 50 条

[21] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
Ravanelli, Mirco
Zhong, Jianyuan
Pascual, Santiago
Swietojanski, Pawel
Monteiro, Joao
Trmal, Jan
Bengio, Yoshua
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993
[22] Gated Self-supervised Learning for Improving Supervised Learning
Fuadi, Erland Hillman
Ruslim, Aristo Renaldo
Wardhana, Putu Wahyu Kusuma
Yudistira, Novanto
2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 611 - 615
[23] Self-Supervised Learning for Recommendation
Huang, Chao
Xia, Lianghao
Wang, Xiang
He, Xiangnan
Yin, Dawei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 5136 - 5139
[24] Longitudinal self-supervised learning
Zhao, Qingyu
Liu, Zixuan
Adeli, Ehsan
Pohl, Kilian M.
MEDICAL IMAGE ANALYSIS, 2021, 71
[25] Quantum self-supervised learning
Jaderberg, B.
Anderson, L. W.
Xie, W.
Albanie, S.
Kiffner, M.
Jaksch, D.
QUANTUM SCIENCE AND TECHNOLOGY, 2022, 7 (03):
[26] Boosting Self-Supervised Embeddings for Speech Enhancement
Hung, Kuo-Hsuan
Fu, Szu-Wei
Tseng, Huan-Hsin
Chiang, Hsin-Tien
Tsao, Yu
Lin, Chii-Wann
INTERSPEECH 2022, 2022, : 186 - 190
[27] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Chang, Xuankai
Maekaku, Takashi
Fujita, Yuya
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 3819 - 3823
[28] Scaling Effect of Self-Supervised Speech Models
Pu, Jie
Yang, Yuguang
Li, Ruirui
Elibol, Oguz
Droppo, Jasha
INTERSPEECH 2021, 2021, : 1084 - 1088
[29] SIMILARITY ANALYSIS OF SELF-SUPERVISED SPEECH REPRESENTATIONS
Chung, Yu-An
Belinkov, Yonatan
Glass, James
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3040 - 3044
[30] ON COMPRESSING SEQUENCES FOR SELF-SUPERVISED SPEECH MODELS
Meng, Yen
Chen, Hsuan-Jui
Shi, Jiatong
Watanabe, Shinji
Garcia, Paola
Lee, Hung-yi
Tang, Hao
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1128 - 1135

← 1 2 3 4 5 →