Speech Enhancement via Mask-Mapping Based Residual Dense Network

被引:1
作者
Zhou, Lin [1 ]
Chen, Xijin [1 ]
Wu, Chaoyan [1 ]
Zhong, Qiuyue [1 ]
Cheng, Xu [2 ]
Tang, Yibin [3 ]
机构
[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[2] Univ Oulu, Ctr Machine Vis & Signal Anal, FI-90014 Oulu, Finland
[3] Hohai Univ, Coll IOT Engn, Changzhou 213022, Peoples R China
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 74卷 / 01期
基金
中国国家自然科学基金;
关键词
Mask-mapping-based method; residual dense block; speech enhancement; ALGORITHM; NOISE;
D O I
10.32604/cmc.2023.027379
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network (DNN). But the mapping-based methods only utilizes the phase of noisy speech, which limits the upper bound of speech enhancement performance. Masking-based methods need to accurately estimate the masking which is still the key problem. Combining the advantages of above two types of methods, this paper proposes the speech enhancement algorithm MM-RDN (masking-mapping residual dense network) based on masking-mapping (MM) and residual dense network (RDN). Using the logarithmic power spectrogram (LPS) of consecutive frames, MM estimates the ideal ratio masking (IRM) matrix of consecutive frames. RDN can make full use of feature maps of all layers. Meanwhile, using the global residual learning to combine the shallow features and deep features, RDN obtains the global dense features from the LPS, thereby improves estimated accuracy of the IRM matrix. Simula-tions show that the proposed method achieves attractive speech enhancement performance in various acoustic environments. Specifically, in the untrained acoustic test with limited priors, e.g., unmatched signal-to-noise ratio (SNR) and unmatched noise category, MM-RDN can still outperform the existing convolutional recurrent network (CRN) method in the measures of perceptual evaluation of speech quality (PESQ) and other evaluation indexes. It indicates that the proposed algorithm is more generalized in untrained conditions.
引用
收藏
页码:1259 / 1277
页数:19
相关论文
共 50 条
[31]   Combination of dynamic features with a new mask to optimize neural network speech enhancement [J].
Mei S. ;
Jia H. ;
Wang X. ;
Wu Y. .
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2021, 48 (03) :91-98
[32]   Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network [J].
Li, Ruwei ;
Sun, Xiaoyue ;
Liu, Yanan ;
Yang, Dengcai ;
Dong, Liang .
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2019, 2019 (1)
[33]   Eigenvector-Based Speech Mask Estimation for Multi-Channel Speech Enhancement [J].
Pfeifenberger, Lukas ;
Zoehrer, Matthias ;
Pernkopf, Franz .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) :2162-2172
[34]   Speech enhancement using progressive learning-based convolutional recurrent neural network [J].
Li, Andong ;
Yuan, Minmin ;
Zheng, Chengshi ;
Li, Xiaodong .
APPLIED ACOUSTICS, 2020, 166
[35]   A reconstruction method for ptychography based on residual dense network [J].
Liu, Mengnan ;
Han, Yu ;
Xi, Xiaoqi ;
Li, Lei ;
Xu, Zijian ;
Zhang, Xiangzhi ;
Zhu, Linlin ;
Yan, Bin .
JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY, 2024, 32 (06) :1505-1519
[36]   Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask [J].
Abdullah, Salinna ;
Zamani, Majid ;
Demosthenous, Andreas .
IEEE ACCESS, 2021, 9 :24350-24362
[37]   RDASNet: Image Denoising via a Residual Dense Attention Similarity Network [J].
Tao, Haowu ;
Guo, Wenhua ;
Han, Rui ;
Yang, Qi ;
Zhao, Jiyuan .
SENSORS, 2023, 23 (03)
[38]   Subjective intelligibility of deep neural network-based speech enhancement [J].
Gelderblom, Femke B. ;
Tronstad, Tron V. ;
Viggen, Erlend Magnus .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :1968-1972
[39]   A FLOW-BASED NEURAL NETWORK FOR TIME DOMAIN SPEECH ENHANCEMENT [J].
Strauss, Martin ;
Edler, Bernd .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :5754-5758
[40]   Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement [J].
Chai, Li ;
Du, Jun ;
Lee, Chin-Hui .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3269-3273