WHAMR!: NOISY AND REVERBERANT SINGLE-CHANNEL SPEECH SEPARATION

被引:0
|
作者
Maciejewski, Matthew [1 ,2 ]
Wichern, Gordon [1 ]
McQuinn, Emmett [3 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs MERL, Cambridge, MA 02139 USA
[2] Johns Hopkins Univ, Baltimore, MD 21218 USA
[3] Whisperai, San Francisco, CA USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
speech separation; speech enhancement; cocktail party problem; reverberation;
D O I
10.1109/icassp40776.2020.9053327
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While significant advances have been made with respect to the separation of overlapping speech signals, studies have been largely constrained to mixtures of clean, near anechoic speech, not representative of many real-world scenarios. Although the WHAM! dataset introduced noise to the ubiquitous wsj0-2mix dataset, it did not include reverberation, which is generally present in indoor recordings outside of recording studios. The spectral smearing caused by reverberation can result in significant performance degradation for standard deep learning-based speech separation systems, which rely on spectral structure and the sparsity of speech signals to tease apart sources. To address this, we introduce WHAMR!, an augmented version of WHAM! with synthetic reverberated sources, and provide a thorough baseline analysis of current techniques as well as novel cascaded architectures on the newly introduced conditions.
引用
收藏
页码:696 / 700
页数:5
相关论文
共 50 条
  • [1] GLMSNET: SINGLE CHANNEL SPEECH SEPARATION FRAMEWORK IN NOISY AND REVERBERANT ENVIRONMENTS
    Shi, Huiyu
    Chen, Xi
    Kong, Tianlong
    Yin, Shouyi
    Ouyang, Peng
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 663 - 670
  • [2] A comprehensive study on supervised single-channel noisy speech separation with multi-task learning
    Dang, Shaoxiang
    Matsumoto, Tetsuya
    Takeuchi, Yoshinori
    Kudo, Hiroaki
    SPEECH COMMUNICATION, 2025, 167
  • [3] TRAINING NOISY SINGLE-CHANNEL SPEECH SEPARATION WITH NOISY ORACLE SOURCES: A LARGE GAP AND A SMALL STEP
    Maciejewski, Matthew
    Shi, Jing
    Watanabe, Shinji
    Khudanpur, Sanjeev
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5774 - 5778
  • [4] SESNet: A Speech Enhancement and Separation Network in Noisy Reverberant Environments
    Wang, Liusong
    Gao, Yuan
    Cao, Kaimin
    Hu, Ying
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 44 - 54
  • [5] Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation
    Wang, Zhong-Qiu
    Wichern, Gordon
    Le Roux, Jonathan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3476 - 3490
  • [6] Speaker Counting and Separation From Single-Channel Noisy Mixtures
    Chetupalli, Srikanth Raj
    Habets, Emanuel A. P.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1681 - 1692
  • [7] Single-Channel Speech Separation Focusing on Attention DE
    Li, Xinshu
    Tan, Zhenhua
    Xia, Zhenche
    Wu, Danke
    Zhang, Bin
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3204 - 3209
  • [8] Single-Channel Speech Separation Using Phase-Based Methods
    Lee, Yun-Kyung
    Lee, In Sung
    Kwon, Oh-Wook
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (04) : 2453 - 2459
  • [9] Deep Clustering in Complex Domain for Single-Channel Speech Separation
    Liu, Runling
    Tang, Yu
    Mang, Hongwei
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1463 - 1468
  • [10] SINGLE-CHANNEL SPEECH SEPARATION WITH MEMORY-ENHANCED RECURRENT NEURAL NETWORKS
    Weninger, Felix
    Eyben, Florian
    Schuller, Bjoern
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,