Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained Models

被引：0

作者：

Qu, Bowen ^{[1
]}

Li, Chenda ^{[1
]}

Bai, Jinfeng ^{[2
]}

Qian, Yanmin ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Inst X LANCE Lab, MoE Key Lab Artificial Intelligence, Dept Comp Sci & Engn, Shanghai, Peoples R China

[2] TAL Educ Grp, Beijing, Peoples R China

来源：

2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年

关键词：

Cocktail party problem; Speech Separation; Pre-training Model;

D O I：

10.1109/ISCSLP57327.2022.10038203

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large-scale self-supervised learning (SSL) models have shown outstanding ability in many speech processing tasks. Most of the SSL models in the literature are trained with datasets where the single-talker utterances dominate. It may not be optimal to directly apply these SSL models to speech separation tasks. Besides, many computational costs in large-scale SSL models increase the overall complexity of the speech separation system. In this paper, we explore the application of pre-trained SSL models in the speech separation task. Instead of using the SSL model directly, we designed an SSL feature predictor to estimate single-talker's deep features from the speech mixture. The SSL feature predictor is trained with the knowledge distilled from the pre-trained Wav2Vec 2.0 model. Our experiments show that the performance of time-domain speech separation can be improved obviously by leveraging the SSL embedding predictor.

引用

页码：329 / 333

页数：5

共 33 条

[1] [Anonymous], 2008, IEEE Trans. Neural Networks
[2] Baevski A, 2020, ADV NEUR IN, V33
[3] Baevski A, 2022, Arxiv, DOI arXiv:2202.03555
[4] Baevski A, 2020, Arxiv, DOI arXiv:1910.05453
[5] Bai SJ, 2018, Arxiv, DOI [arXiv:1803.01271, DOI 10.48550/ARXIV.1803.01271]
[6] The cocktail party problem: What is it? How can it be solved? And why should animal behaviorists study it?
Bee, Mark A.
Micheyl, Christophe
[J]. JOURNAL OF COMPARATIVE PSYCHOLOGY, 2008, 122 (03) : 235 - 251
[7] Bengio Y., 2009, P 26 ANN INT C MACHI, P41
[8] Chen SY, 2022, Arxiv, DOI arXiv:2110.13900
[9] CONTINUOUS SPEECH SEPARATION WITH CONFORMER
Chen, Sanyuan
Wu, Yu
Chen, Zhuo
Wu, Jian
Li, Jinyu
Yoshioka, Takuya
Wang, Chengyi
Liu, Shujie
Zhou, Ming
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5749 - 5753
[10] SOME EXPERIMENTS ON THE RECOGNITION OF SPEECH, WITH ONE AND WITH 2 EARS
CHERRY, EC
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953, 25 (05) : 975 - 979

← 1 2 3 4 →