SKILL: SIMILARITY-AWARE KNOWLEDGE DISTILLATION FOR SPEECH SELF-SUPERVISED LEARNING

被引：0

作者：

Zampierin, Luca ^{[1
,2
]}

Hacene, Ghouthi Boukli ^{[1
,5
]}

Nguyen, Bac ^{[1
]}

Ravanelli, Mirco ^{[3
,4
,5
]}

机构：

[1] Sony Europe BV, Stuttgart Lab 1, Stuttgart, Germany

[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

[3] Concordia Univ, Montreal, PQ, Canada

[4] Univ Montreal, Montreal, PQ, Canada

[5] Mila Quebec AI Inst, Montreal, PQ, Canada

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年

关键词：

Model compression; self-supervised learning; knowledge distillation;

D O I：

10.1109/ICASSPW62465.2024.10626978

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version ofWavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks.

引用

页码：675 / 679

页数：5

共 50 条

[41] Learning the Relation Between Similarity Loss and Clustering Loss in Self-Supervised Learning
Ge, Jidong
Liu, Yuxiang
Gui, Jie
Fang, Lanting
Lin, Ming
Kwok, James Tin-Yau
Huang, Liguo
Luo, Bin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3442 - 3454
[42] Graph Knowledge Structure for Attentional Knowledge Tracing With Self-Supervised Learning
Liu, Zhaohui
Liu, Sainan
Gu, Weifeng
IEEE ACCESS, 2025, 13 : 10933 - 10943
[43] Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Wu, Yihan
Wang, Xi
Zhang, Shaofei
He, Lei
Song, Ruihua
Nie, Jian-Yun
INTERSPEECH 2022, 2022, : 5503 - 5507
[44] Self-supervised learning with ensemble representations
Han, Kyoungmin
Lee, Minsik
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 143
[45] Cross Pixel Optical-Flow Similarity for Self-supervised Learning
Mahendran, Aravindh
Thewlis, James
Vedaldi, Andrea
COMPUTER VISION - ACCV 2018, PT V, 2019, 11365 : 99 - 116
[46] Pose-Aware Self-supervised Learning with Viewpoint Trajectory Regularization
Wang, Jiayun
Chen, Yubei
Yu, Stella X.
COMPUTER VISION - ECCV 2024, PT XXI, 2025, 15079 : 19 - 37
[47] CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Meng, Chutong
Ao, Junyi
Ko, Tom
Wang, Mingxuan
Li, Haizhou
INTERSPEECH 2023, 2023, : 2978 - 2982
[48] Mobility-aware federated self-supervised learning in vehicular network
Xueying Gu
Qiong Wu
Qiang Fan
Pingyi Fan
Urban Lifeline, 2 (1):
[49] Time Interval Aware Collaborative Sequential Recommendation with Self-supervised Learning
Ma, Chenrui
Li, Li
Chen, Rui
Li, Xi
Wang, Yichen
WEB AND BIG DATA, PT III, APWEB-WAIM 2022, 2023, 13423 : 87 - 101
[50] OBJECT-AWARE SELF-SUPERVISED MULTI-LABEL LEARNING
Xu Kaixin
Liu Liyang
Zhao Ziyuan
Zeng, Zeng
Veeravalli, Bharadwaj
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 361 - 365

← 1 2 3 4 5 →