MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

被引:4
|
作者
Karamatli, Ertug [1 ]
Kirbiz, Serap [2 ]
机构
[1] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
[2] MEF Univ, Dept Elect & Elect Engn, TR-34396 Istanbul, Turkey
关键词
Training; Recording; Source separation; Time-domain analysis; Task analysis; Optimized production technology; Unsupervised learning; Blind source separation; deep learning; self-supervised learning; unsupervised learning;
D O I
10.1109/LSP.2022.3232276
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).
引用
收藏
页码:2637 / 2641
页数:5
相关论文
共 48 条
  • [1] ON PERMUTATION INVARIANT TRAINING FOR SPEECH SOURCE SEPARATION
    Liu, Xiaoyu
    Pons, Jordi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6 - 10
  • [2] Probabilistic Permutation Invariant Training for Speech Separation
    Yousefi, Midia
    Khorram, Soheil
    Hansen, John H. L.
    INTERSPEECH 2019, 2019, : 4604 - 4608
  • [3] INTERRUPTED AND CASCADED PERMUTATION INVARIANT TRAINING FOR SPEECH SEPARATION
    Yang, Gene-Ping
    Wu, Szu-Lin
    Mao, Yao-Wen
    Lee, Hung-yi
    Lee, Lin-shah
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6369 - 6373
  • [4] Unsupervised Sound Separation Using Mixture Invariant Training
    Wisdom, Scott
    Tzinis, Efthymios
    Erdogan, Hakan
    Weiss, Ron J.
    Wilson, Kevin
    Hershey, John R.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] Overlap Aware Continuous Speech Separation without Permutation Invariant Training
    Yu, Linfeng
    Zhang, Wangyou
    Li, Chenda
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 3512 - 3516
  • [6] Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
    Chen, Lianwu
    Yu, Meng
    Qian, Yanmin
    Su, Dan
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 302 - 306
  • [7] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
    Yin, Lu
    Wang, Ziteng
    Xia, Risheng
    Li, Junfeng
    Yan, Yonghong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
  • [8] Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation
    Fan, Cunhang
    Liu, Bin
    Tao, Jianhua
    Wen, Zhengqi
    Yi, Jiangyan
    Bai, Ye
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 26 - 30
  • [9] Single-channel speech separation using soft-minimum permutation invariant training
    Yousefi, Midia
    Hansen, John H. L.
    SPEECH COMMUNICATION, 2023, 151 : 76 - 85
  • [10] SPARSE, EFFICIENT, AND SEMANTIC MIXTURE INVARIANT TRAINING: TAMING IN-THE-WILD UNSUPERVISED SOUND SEPARATION
    Wisdom, Scott
    Jansen, Aren
    Weiss, Ron J.
    Erdogan, Hakan
    Hershey, John R.
    2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 51 - 55