Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained Models

被引:0
作者
Qu, Bowen [1 ]
Li, Chenda [1 ]
Bai, Jinfeng [2 ]
Qian, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, Inst X LANCE Lab, MoE Key Lab Artificial Intelligence, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] TAL Educ Grp, Beijing, Peoples R China
来源
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年
关键词
Cocktail party problem; Speech Separation; Pre-training Model;
D O I
10.1109/ISCSLP57327.2022.10038203
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale self-supervised learning (SSL) models have shown outstanding ability in many speech processing tasks. Most of the SSL models in the literature are trained with datasets where the single-talker utterances dominate. It may not be optimal to directly apply these SSL models to speech separation tasks. Besides, many computational costs in large-scale SSL models increase the overall complexity of the speech separation system. In this paper, we explore the application of pre-trained SSL models in the speech separation task. Instead of using the SSL model directly, we designed an SSL feature predictor to estimate single-talker's deep features from the speech mixture. The SSL feature predictor is trained with the knowledge distilled from the pre-trained Wav2Vec 2.0 model. Our experiments show that the performance of time-domain speech separation can be improved obviously by leveraging the SSL embedding predictor.
引用
收藏
页码:329 / 333
页数:5
相关论文
共 33 条
  • [1] [Anonymous], 2008, IEEE Trans. Neural Networks
  • [2] Baevski A, 2020, ADV NEUR IN, V33
  • [3] Baevski A, 2022, Arxiv, DOI arXiv:2202.03555
  • [4] Baevski A, 2020, Arxiv, DOI arXiv:1910.05453
  • [5] Bai SJ, 2018, Arxiv, DOI [arXiv:1803.01271, DOI 10.48550/ARXIV.1803.01271]
  • [6] The cocktail party problem: What is it? How can it be solved? And why should animal behaviorists study it?
    Bee, Mark A.
    Micheyl, Christophe
    [J]. JOURNAL OF COMPARATIVE PSYCHOLOGY, 2008, 122 (03) : 235 - 251
  • [7] Bengio Y., 2009, P 26 ANN INT C MACHI, P41
  • [8] Chen SY, 2022, Arxiv, DOI arXiv:2110.13900
  • [9] CONTINUOUS SPEECH SEPARATION WITH CONFORMER
    Chen, Sanyuan
    Wu, Yu
    Chen, Zhuo
    Wu, Jian
    Li, Jinyu
    Yoshioka, Takuya
    Wang, Chengyi
    Liu, Shujie
    Zhou, Ming
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5749 - 5753