MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

被引：4

作者：

Karamatli, Ertug ^{[1
]}

Kirbiz, Serap ^{[2
]}

机构：

[1] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey

[2] MEF Univ, Dept Elect & Elect Engn, TR-34396 Istanbul, Turkey

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

关键词：

Training; Recording; Source separation; Time-domain analysis; Task analysis; Optimized production technology; Unsupervised learning; Blind source separation; deep learning; self-supervised learning; unsupervised learning;

D O I：

10.1109/LSP.2022.3232276

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).

引用

页码：2637 / 2641

页数：5

共 48 条

[1] ON PERMUTATION INVARIANT TRAINING FOR SPEECH SOURCE SEPARATION
Liu, Xiaoyu
Pons, Jordi
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6 - 10
[2] Probabilistic Permutation Invariant Training for Speech Separation
Yousefi, Midia
Khorram, Soheil
Hansen, John H. L.
INTERSPEECH 2019, 2019, : 4604 - 4608
[3] INTERRUPTED AND CASCADED PERMUTATION INVARIANT TRAINING FOR SPEECH SEPARATION
Yang, Gene-Ping
Wu, Szu-Lin
Mao, Yao-Wen
Lee, Hung-yi
Lee, Lin-shah
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6369 - 6373
[4] Unsupervised Sound Separation Using Mixture Invariant Training
Wisdom, Scott
Tzinis, Efthymios
Erdogan, Hakan
Weiss, Ron J.
Wilson, Kevin
Hershey, John R.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[5] Overlap Aware Continuous Speech Separation without Permutation Invariant Training
Yu, Linfeng
Zhang, Wangyou
Li, Chenda
Qian, Yanmin
INTERSPEECH 2023, 2023, : 3512 - 3516
[6] Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Chen, Lianwu
Yu, Meng
Qian, Yanmin
Su, Dan
Yu, Dong
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 302 - 306
[7] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
Yin, Lu
Wang, Ziteng
Xia, Risheng
Li, Junfeng
Yan, Yonghong
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
[8] Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation
Fan, Cunhang
Liu, Bin
Tao, Jianhua
Wen, Zhengqi
Yi, Jiangyan
Bai, Ye
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 26 - 30
[9] Single-channel speech separation using soft-minimum permutation invariant training
Yousefi, Midia
Hansen, John H. L.
SPEECH COMMUNICATION, 2023, 151 : 76 - 85
[10] SPARSE, EFFICIENT, AND SEMANTIC MIXTURE INVARIANT TRAINING: TAMING IN-THE-WILD UNSUPERVISED SOUND SEPARATION
Wisdom, Scott
Jansen, Aren
Weiss, Ron J.
Erdogan, Hakan
Hershey, John R.
2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 51 - 55

← 1 2 3 4 5 →