TRAINING NOISY SINGLE-CHANNEL SPEECH SEPARATION WITH NOISY ORACLE SOURCES: A LARGE GAP AND A SMALL STEP

被引:4
作者
Maciejewski, Matthew [1 ,2 ]
Shi, Jing [1 ,3 ]
Watanabe, Shinji [1 ,2 ]
Khudanpur, Sanjeev [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
[3] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
speech separation; noisy speech; deep learning;
D O I
10.1109/ICASSP39728.2021.9413975
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As the performance of single-channel speech separation systems has improved, there has been a desire to move to more challenging conditions than the clean, near-field speech that initial systems were developed on. When training deep learning separation models, a need for ground truth leads to training on synthetic mixtures. As such, training in noisy conditions requires either using noise synthetically added to clean speech, preventing the use of in-domain data for a noisy-condition task, or training using mixtures of noisy speech, requiring the network to additionally separate the noise. We demonstrate the relative inseparability of noise and that this noisy speech paradigm leads to significant degradation of system performance. We also propose an SI-SDR-inspired training objective that tries to exploit the inseparability of noise to implicitly partition the signal and discount noise separation errors, enabling the training of better separation systems with noisy oracle sources.
引用
收藏
页码:5774 / 5778
页数:5
相关论文
共 34 条
[21]   Time-domain adaptive attention network for single-channel speech separation [J].
Kunpeng Wang ;
Hao Zhou ;
Jingxiang Cai ;
Wenna Li ;
Juan Yao .
EURASIP Journal on Audio, Speech, and Music Processing, 2023
[22]   A Pitch State Dependent Dictionary Design Method for Single-Channel Speech Separation [J].
Guo, Haiyan ;
Yang, Zhen ;
Zhang, Linghua ;
Ye, Lei .
2016 8TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING (WCSP), 2016,
[23]   Many-Speakers Single Channel Speech Separation with Optimal Permutation Training [J].
Dovrat, Shaked ;
Nachmani, Eliya ;
Wolf, Lior .
INTERSPEECH 2021, 2021, :3890-3894
[24]   Single-Channel Speech. Separation Using Phase Model Based Soft Mask [J].
Lee, Yun-Kyung ;
Kwon, Oh-Wook .
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2010, 29 (02) :141-147
[25]   DNN TRAINING BASED ON CLASSIC GAIN FUNCTION FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION [J].
Tu, Yan-Hui ;
Du, Jun ;
Lee, Chin-Hui .
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, :910-914
[26]   TASNET: TIME-DOMAIN AUDIO SEPARATION NETWORK FOR REAL-TIME, SINGLE-CHANNEL SPEECH SEPARATION [J].
Luo, Yi ;
Mesgarani, Nima .
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, :696-700
[27]   Single-channel speech separation based on long-short frame associated harmonic model [J].
Huang, Qinghua ;
Wang, Dongmei .
DIGITAL SIGNAL PROCESSING, 2011, 21 (04) :497-507
[28]   SINGLE-CHANNEL SPEECH SEPARATION INTEGRATING PITCH INFORMATION BASED ON A MULTI TASK LEARNING FRAMEWORK [J].
Li, Xiang ;
Liu, Rui ;
Song, Tao ;
Wu, Xihong ;
Chen, Jing .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :7279-7283
[29]   DUAL-PATH RNN: EFFICIENT LONG SEQUENCE MODELING FOR TIME-DOMAIN SINGLE-CHANNEL SPEECH SEPARATION [J].
Luo, Yi ;
Ghen, Zhuo ;
Yoshioka, Takuya .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :46-50
[30]   ANALYSIS OF ROBUSTNESS OF DEEP SINGLE-CHANNEL SPEECH SEPARATION USING CORPORA CONSTRUCTED FROM MULTIPLE DOMAINS [J].
Maciejewski, Matthew ;
Sell, Gregory ;
Fujita, Yusuke ;
Garcia-Perera, Leibny Paola ;
Watanabe, Shinji ;
Khudanpur, Sanjeev .
2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, :165-169