ENHANCED TIME-FREQUENCY MASKING BY USING NEURAL NETWORKS FOR MONAURAL SOURCE SEPARATION IN REVERBERANT ROOM ENVIRONMENTS

被引:0
|
作者
Sun, Yang [1 ]
Wang, Wenwu [2 ]
Chambers, Jonathon A. [1 ]
Naqvi, Syed Mohsen [1 ]
机构
[1] Newcastle Univ, Intelligent Sensing & Commun Res Grp, Newcastle Upon Tyne, Tyne & Wear, England
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England
来源
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2018年
关键词
source separation; reverberant room environments; dereverberation; time-frequency mask; SPEECH; RECOGNITION; NOISE;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep neural networks (DNNs) have been used for dereverberation and denosing in the monaural source separation problem. However, the performance of current state-of-the-art methods is limited, particularly when applied in highly reverberant room environments. In this paper, we propose an enhanced time-frequency (T-F) mask to improve the separation performance. The ideal enhanced mask (IEM) consists of the dereverberation mask (DM) and the ideal ratio mask (IRM). The DM is specifically applied to eliminate the reverberations in the speech mixture and the IRM helps in denoising. The IEEE and the TIMIT corpora with real room impulse responses (RIRs) and noise from the NOISEX dataset are used to generate speech mixtures for evaluations. The proposed method outperforms the state-of-the-art methods specifically in highly reverberant and noisy room environments.
引用
收藏
页码:1647 / 1651
页数:5
相关论文
共 50 条
  • [41] Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points
    Jia, Maoshen
    Wu, Yuxuan
    Bao, Changchun
    Ritz, Christian
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 379 - 392
  • [42] Monaural Source Separation Based on Sequentially Trained LSTMs in Real Room Environments
    Li, Yi
    Sun, Yang
    Naqvi, Syed Mohsen
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [43] The importance of time-frequency averaging for binaural speaker localization in reverberant environments
    Beit-On, Hanan
    Tourbabin, Vladimir
    Rafaely, Boaz
    INTERSPEECH 2020, 2020, : 5071 - 5075
  • [44] SEQUENTIALLY TRAINED DNNS BASED MONAURAL SOURCE SEPARATION IN REAL ROOM ENVIRONMENTS
    Li, Yi
    Sun, Yang
    Naqvi, Syed Mohsen
    2019 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE (SSPD), 2019,
  • [45] On Using Time-Frequency Binary Masking For Dereverberation
    Mischie, Septimiu
    2013 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2013,
  • [46] Blind separation of speech mixtures via time-frequency masking
    Yilmaz, Ö
    Rickard, S
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (07) : 1830 - 1847
  • [47] Musical Sound Separation Based on Binary Time-Frequency Masking
    Yipeng Li
    DeLiang Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [48] Musical Sound Separation Based on Binary Time-Frequency Masking
    Li, Yipeng
    Wang, DeLiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
  • [49] Stereo-input Speech Recognition using Sparseness-based Time-frequency Masking in a Reverberant Environment
    Izumi, Yosuke
    Nishiki, Kenta
    Watanabe, Shinji
    Nishimoto, Takuya
    Ono, Nobutaka
    Sagayama, Shigeki
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1907 - +
  • [50] Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking
    Liu, Qingju
    Wang, Wenwu
    Jackson, Philip J. B.
    Barnard, Mark
    Kittler, Josef
    Chambers, Jonathon
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (22) : 5520 - 5535