Teacher-Student MixIT for Unsupervised and Semi-supervised Speech Separation

被引:11
|
作者
Zhang, Jisi [1 ]
Zorila, Catalin [2 ]
Doddipatla, Rama [2 ]
Barker, Jon [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
[2] Toshiba Cambridge Res Lab, Cambridge, England
来源
INTERSPEECH 2021 | 2021年
关键词
semi-supervised learning; speech separation; teacher-student;
D O I
10.21437/Interspeech.2021-1243
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we introduce a novel semi-supervised learning framework for end-to-end speech separation. The proposed method first uses mixtures of unseparated sources and the mixture invariant training (MixIT) criterion to train a teacher model. The teacher model then estimates separated sources that are used to train a student model with standard permutation invariant training (PIT). The student model can be fine-tuned with supervised data, i.e., paired artificial mixtures and clean speech sources, and further improved via model distillation. Experiments with single and multi channel mixtures show that the teacher-student training resolves the over-separation problem observed in the original MixIT method. Further, the semi-supervised performance is comparable to a fully-supervised separation system trained using ten times the amount of supervised data.
引用
收藏
页码:3495 / 3499
页数:5
相关论文
共 50 条
  • [41] Statistical Models for Unsupervised, Semi-Supervised, and Supervised Transliteration Mining
    Sajjad, Hassan
    Schmid, Helmut
    Fraser, Alexander
    Schuetze, Hinrich
    COMPUTATIONAL LINGUISTICS, 2017, 43 (02) : 349 - 375
  • [42] TEACHER-STUDENT DEEP CLUSTERING FOR LOW-DELAY SINGLE CHANNEL SPEECH SEPARATION
    Aihara, Ryo
    Hanazawa, Toshiyuki
    Okato, Yohei
    Wichern, Gordon
    Le Roux, Jonathan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 690 - 694
  • [43] A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model
    Almasri, Miada
    Al-Malki, Norah
    Alotaibi, Reem
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [44] Semi-supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 688 - +
  • [45] Semi-supervised student-teacher learning for single image super-resolution
    Wang, Lin
    Yoon, Kuk-Jin
    Pattern Recognition, 2022, 121
  • [46] BTS: Bifold Teacher-Student in Semi-Supervised Learning for Indoor Two-Room Presence Detection Under Time-Varying CSI
    Shen, Li-Hsiang
    Hsiao, An-Hung
    Chen, Kai-Jui
    Tsai, Tsung-Ting
    Feng, Kai-Ten
    IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (07): : 8789 - 8806
  • [47] Semi-supervised student-teacher learning for single image super-resolution
    Wang, Lin
    Yoon, Kuk-Jin
    PATTERN RECOGNITION, 2022, 121
  • [48] Active Teacher for Semi-Supervised Object Detection
    Mi, Peng
    Lin, Jianghang
    Zhou, Yiyi
    Shen, Yunhang
    Luo, Gen
    Sun, Xiaoshuai
    Cao, Liujuan
    Fu, Rongrong
    Xu, Qiang
    Ji, Rongrong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14462 - 14471
  • [49] Unsupervised Anomaly Detection with Distillated Teacher-Student Network Ensemble
    Xiao, Qinfeng
    Wang, Jing
    Lin, Youfang
    Gongsa, Wenbo
    Hu, Ganghui
    Li, Menggang
    Wang, Fang
    ENTROPY, 2021, 23 (02) : 1 - 18
  • [50] COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION
    Li, Chak-Fai
    Keith, Francis
    Hartmann, William
    Snover, Matthew
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6892 - 6896