Teacher-Student MixIT for Unsupervised and Semi-supervised Speech Separation

被引：11

作者：

Zhang, Jisi ^{[1
]}

Zorila, Catalin ^{[2
]}

Doddipatla, Rama ^{[2
]}

Barker, Jon ^{[1
]}

机构：

[1] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England

[2] Toshiba Cambridge Res Lab, Cambridge, England

来源：

INTERSPEECH 2021 | 2021年

关键词：

semi-supervised learning; speech separation; teacher-student;

D O I：

10.21437/Interspeech.2021-1243

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this paper, we introduce a novel semi-supervised learning framework for end-to-end speech separation. The proposed method first uses mixtures of unseparated sources and the mixture invariant training (MixIT) criterion to train a teacher model. The teacher model then estimates separated sources that are used to train a student model with standard permutation invariant training (PIT). The student model can be fine-tuned with supervised data, i.e., paired artificial mixtures and clean speech sources, and further improved via model distillation. Experiments with single and multi channel mixtures show that the teacher-student training resolves the over-separation problem observed in the original MixIT method. Further, the semi-supervised performance is comparable to a fully-supervised separation system trained using ten times the amount of supervised data.

引用

页码：3495 / 3499

页数：5

共 50 条

[41] Statistical Models for Unsupervised, Semi-Supervised, and Supervised Transliteration Mining
Sajjad, Hassan
Schmid, Helmut
Fraser, Alexander
Schuetze, Hinrich
COMPUTATIONAL LINGUISTICS, 2017, 43 (02) : 349 - 375
[42] TEACHER-STUDENT DEEP CLUSTERING FOR LOW-DELAY SINGLE CHANNEL SPEECH SEPARATION
Aihara, Ryo
Hanazawa, Toshiyuki
Okato, Yohei
Wichern, Gordon
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 690 - 694
[43] A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model
Almasri, Miada
Al-Malki, Norah
Alotaibi, Reem
PEERJ COMPUTER SCIENCE, 2023, 9
[44] Semi-supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition
Demir, Cemil
Cemgil, A. Taylan
Saraclar, Murat
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 688 - +
[45] Semi-supervised student-teacher learning for single image super-resolution
Wang, Lin
Yoon, Kuk-Jin
Pattern Recognition, 2022, 121
[46] BTS: Bifold Teacher-Student in Semi-Supervised Learning for Indoor Two-Room Presence Detection Under Time-Varying CSI
Shen, Li-Hsiang
Hsiao, An-Hung
Chen, Kai-Jui
Tsai, Tsung-Ting
Feng, Kai-Ten
IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (07): : 8789 - 8806
[47] Semi-supervised student-teacher learning for single image super-resolution
Wang, Lin
Yoon, Kuk-Jin
PATTERN RECOGNITION, 2022, 121
[48] Active Teacher for Semi-Supervised Object Detection
Mi, Peng
Lin, Jianghang
Zhou, Yiyi
Shen, Yunhang
Luo, Gen
Sun, Xiaoshuai
Cao, Liujuan
Fu, Rongrong
Xu, Qiang
Ji, Rongrong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14462 - 14471
[49] Unsupervised Anomaly Detection with Distillated Teacher-Student Network Ensemble
Xiao, Qinfeng
Wang, Jing
Lin, Youfang
Gongsa, Wenbo
Hu, Ganghui
Li, Menggang
Wang, Fang
ENTROPY, 2021, 23 (02) : 1 - 18
[50] COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION
Li, Chak-Fai
Keith, Francis
Hartmann, William
Snover, Matthew
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6892 - 6896

← 1 2 3 4 5 →