Dual channel neural network speech enhancement algorithm based on time frequency masking

被引：0

作者：

Jia, Hairong ^{[1
]}

Mei, Shulin ^{[1
]}

Zhang, Min ^{[1
]}

机构：

[1] College of Information and Computer, Taiyuan University of Technology, Taiyuan,030024, China

来源：

Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition) | 2021年 / 49卷 / 06期

关键词：

Speech intelligibility - Covariance matrix - Signal to noise ratio - Deep neural networks - Reverberation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In order to improve the ability of speech enhancement algorithm eliminate directional noise and suppress reverberation, combining the advantages of single and multi-channel processing signals, a dual-channel neural network time-frequency masking speech enhancement algorithm was proposed. First, using the improved multi-resolution cochlear dynamic and static features (DSMRACC), combined with an adaptive mask (AM) optimized based on the signal-to-noise ratio (SNR), the dual-microphone signals were separately enhanced by a single channel deep neural network (DNN) to achieve the goal of fully utilizing the nonlinear features of speech to improve perception. Second, a steering vector localization method based on the AM was proposed to accurately calculate the spatial covariance matrix and steering vectors, locate the target speech accurately under the noise and reverberation environment. Finally, signal was input to a convolutional beamformer to further denoise and suppress reverberation. The experimental results show that compared with other speech enhancement algorithms, the enhanced speech has better speech quality and intelligibility. © 2021 Editorial Board of Journal of Huazhong University of Science and Technology. All right reserved.

引用

页码：43 / 49

共 50 条

[1] Time frequency masking based speech enhancement using deep encoder-decoder neural network
Shi, Wenhua
Zhang, Xiongwei
Zou, Xia
Sun, Meng
Li, Li
Shengxue Xuebao/Acta Acustica, 2020, 45 (03): : 299 - 307
[2] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
Xue, Cheng
Huang, Weilong
Chen, Weiguang
Feng, Jinwei
INTERSPEECH 2021, 2021, : 1862 - 1866
[3] Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks
Chakrabarty, Soumitro
Habets, Emanuel A. P.
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 787 - 799
[4] A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network
Han W.
Zhang X.-W.
Min G.
Zhang Q.-Y.
Zhang, Xiong-Wei (xwzhang9898@163.com), 2017, Science Press (43): : 248 - 258
[5] TIME-FREQUENCY MASKING BASED ONLINE SPEECH ENHANCEMENT WITH MULTI-CHANNEL DATA USING CONVOLUTIONAL NEURAL NETWORKS
Chakrabarty, Soumitro
Wang, DeLiang
Habets, Emanuel A. P.
2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 476 - 480
[6] TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK
Soni, Meet H.
Shah, Neil
Patil, Hemant A.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5039 - 5043
[7] Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks
Guo, Xinyu
Ou, Shifeng
Gao, Meng
Gao, Ying
2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 445 - 450
[8] PHASE TIME-FREQUENCY MASKING BASED SPEECH ENHANCEMENT ALGORITHM USING CIRCULAR MICROPHONE ARRAY
He, Li
Zhou, Yi
Liu, Hongqing
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 808 - 813
[9] A time-frequency smoothing neural network for speech enhancement
Yuan, Wenhao
SPEECH COMMUNICATION, 2020, 124 : 75 - 84
[10] A Phase-Based Time-Frequency masking for multi-channel speech enhancement in domestic environments
Brutti, Alessio
Tsiami, Antigoni
Katsamanis, Athanasios
Maragos, Petros
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2875 - 2879

← 1 2 3 4 5 →