ON PERMUTATION INVARIANT TRAINING FOR SPEECH SOURCE SEPARATION

被引:0
|
作者
Liu, Xiaoyu [1 ]
Pons, Jordi [1 ]
机构
[1] Dolby Labs, San Francisco, CA 94103 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
Speech source separation; permutation invariant training; waveform-based models; spectrogram-based models; FILTERBANK;
D O I
10.1109/ICASSP39728.2021.9413559
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models. We extend two state-of-the-art PIT strategies. First, we look at the two-stage speaker separation and tracking algorithm based on frame level PIT (tPIT) and clustering, which was originally proposed for the STFT domain, and we adapt it to work with waveforms and over a learned latent space. Further, we propose an efficient clustering loss scalable to waveform models. Second, we extend a recently proposed auxiliary speaker-ID loss with a deep feature loss based on "problem agnostic speech features", to reduce the local permutation errors made by the utterance level PIT (uPIT). Our results show that the proposed extensions help reducing permutation ambiguity. However, we also note that the studied STFT-based models are more effective at reducing permutation errors than waveform-based models, a perspective overlooked in recent studies.
引用
收藏
页码:6 / 10
页数:5
相关论文
共 50 条
  • [21] Many-Speakers Single Channel Speech Separation with Optimal Permutation Training
    Dovrat, Shaked
    Nachmani, Eliya
    Wolf, Lior
    INTERSPEECH 2021, 2021, : 3890 - 3894
  • [22] END-TO-END MICROPHONE PERMUTATION AND NUMBER INVARIANT MULTI-CHANNEL SPEECH SEPARATION
    Luo, Yi
    Chen, Zhuo
    Mesgarani, Nima
    Yoshioka, Takuya
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6394 - 6398
  • [23] Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals
    Reju, V. G.
    Koh, Soo Ngee
    Soon, Ing Yann
    NEUROCOMPUTING, 2008, 71 (10-12) : 2098 - 2112
  • [24] ADAPTIVE PERMUTATION INVARIANT TRAINING WITH AUXILIARY INFORMATION FOR MONAURAL MULTI-TALKER SPEECH RECOGNITION
    Chang, Xuankai
    Qian, Yanmin
    Yu, Dong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5974 - 5978
  • [25] Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation
    Huang, Lu
    Cheng, Gaofeng
    Zhang, Pengyuan
    Yang, Yi
    Xu, Shumin
    Sun, Jiasong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1256 - 1261
  • [26] Geometrically Constrained Permutation-free Source Separation in an Undercomplete Speech Unmixing Scenario
    Visser, Erik
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2626 - 2629
  • [27] Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers
    von Neumann, Thilo
    Kinoshita, Keisuke
    Boeddeker, Christoph
    Delcroix, Marc
    Haeb-Umbach, Reinhold
    INTERSPEECH 2021, 2021, : 3490 - 3494
  • [28] KNOWLEDGE TRANSFER IN PERMUTATION INVARIANT TRAINING FOR SINGLE-CHANNEL MULTI-TALKER SPEECH RECOGNITION
    Tan, Tian
    Qian, Yanmin
    Yu, Dong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5714 - 5718
  • [29] The Improved Method for Solving Permutation Problem in Frequency Domain Blind Source Separation of Speech Signals
    Zhang Dexiang
    Wu Xiaopei
    Lv Zhao
    Guo Xiaojing
    MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 7029 - 7034
  • [30] LEARNING INVARIANT FEATURES FOR SPEECH SEPARATION
    Han, Kun
    Wang, DeLiang
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7492 - 7496