ENHANCED TIME-FREQUENCY MASKING BY USING NEURAL NETWORKS FOR MONAURAL SOURCE SEPARATION IN REVERBERANT ROOM ENVIRONMENTS

被引：0

作者：

Sun, Yang ^{[1
]}

Wang, Wenwu ^{[2
]}

Chambers, Jonathon A. ^{[1
]}

Naqvi, Syed Mohsen ^{[1
]}

机构：

[1] Newcastle Univ, Intelligent Sensing & Commun Res Grp, Newcastle Upon Tyne, Tyne & Wear, England

[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England

来源：

2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2018年

关键词：

source separation; reverberant room environments; dereverberation; time-frequency mask; SPEECH; RECOGNITION; NOISE;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Deep neural networks (DNNs) have been used for dereverberation and denosing in the monaural source separation problem. However, the performance of current state-of-the-art methods is limited, particularly when applied in highly reverberant room environments. In this paper, we propose an enhanced time-frequency (T-F) mask to improve the separation performance. The ideal enhanced mask (IEM) consists of the dereverberation mask (DM) and the ideal ratio mask (IRM). The DM is specifically applied to eliminate the reverberations in the speech mixture and the IRM helps in denoising. The IEEE and the TIMIT corpora with real room impulse responses (RIRs) and noise from the NOISEX dataset are used to generate speech mixtures for evaluations. The proposed method outperforms the state-of-the-art methods specifically in highly reverberant and noisy room environments.

引用

页码：1647 / 1651

页数：5

共 50 条

[41] Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points
Jia, Maoshen
Wu, Yuxuan
Bao, Changchun
Ritz, Christian
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 379 - 392
[42] Monaural Source Separation Based on Sequentially Trained LSTMs in Real Room Environments
Li, Yi
Sun, Yang
Naqvi, Syed Mohsen
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[43] The importance of time-frequency averaging for binaural speaker localization in reverberant environments
Beit-On, Hanan
Tourbabin, Vladimir
Rafaely, Boaz
INTERSPEECH 2020, 2020, : 5071 - 5075
[44] SEQUENTIALLY TRAINED DNNS BASED MONAURAL SOURCE SEPARATION IN REAL ROOM ENVIRONMENTS
Li, Yi
Sun, Yang
Naqvi, Syed Mohsen
2019 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE (SSPD), 2019,
[45] On Using Time-Frequency Binary Masking For Dereverberation
Mischie, Septimiu
2013 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2013,
[46] Blind separation of speech mixtures via time-frequency masking
Yilmaz, Ö
Rickard, S
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (07) : 1830 - 1847
[47] Musical Sound Separation Based on Binary Time-Frequency Masking
Yipeng Li
DeLiang Wang
EURASIP Journal on Audio, Speech, and Music Processing, 2009
[48] Musical Sound Separation Based on Binary Time-Frequency Masking
Li, Yipeng
Wang, DeLiang
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
[49] Stereo-input Speech Recognition using Sparseness-based Time-frequency Masking in a Reverberant Environment
Izumi, Yosuke
Nishiki, Kenta
Watanabe, Shinji
Nishimoto, Takuya
Ono, Nobutaka
Sagayama, Shigeki
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1907 - +
[50] Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking
Liu, Qingju
Wang, Wenwu
Jackson, Philip J. B.
Barnard, Mark
Kittler, Josef
Chambers, Jonathon
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (22) : 5520 - 5535

← 1 2 3 4 5 →