Combination of dynamic features with a new mask to optimize neural network speech enhancement

被引:0
作者
Mei S. [1 ]
Jia H. [1 ]
Wang X. [2 ]
Wu Y. [2 ]
机构
[1] College of Information and Computer, Taiyuan University of Technology, Taiyuan
[2] Network Optimization Center, China Unicom Shanxi Branch, Taiyuan
来源
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University | 2021年 / 48卷 / 03期
关键词
Adaptive mask; Dynamic characteristics; Neural Network; Speech enhancement;
D O I
10.19665/j.issn1001-2400.2021.03.012
中图分类号
学科分类号
摘要
Concerning the problem that the Neural Network speech enhancement algorithm cannot fully represent the nonlinear structure of speech due to feature selection, which leads to speech distortion. This paper proposes the combination of dynamic features with a new mask to optimize neural network speech enhancement. First, three features of noisy speech are extracted and spliced to obtain static features. Then, the first and second difference derivatives are obtained to capture the instantaneous signals of speech and fuse them into dynamic features. The combination of dynamic and static features completes internal complementarity of features and reduced speech distortion. Second, in order to enhance the intelligibility and clarity of speech at the same time, an adaptive mask is proposed, which can adjust the energy ratio of speech and noise as well as the ratio of the traditional mask and the square root mask. The Gammatone channel weight is used to modify the mask value in each channel to simulate the human auditory system and further improve the speech intelligibility. Finally, the simulation of multiple voices under different noise backgrounds shows that compared with different literature algorithms, the algorithm has a higher SNR, subjective speech quality and short-term objective intelligibility, which verifies the effectiveness of the algorithm. © 2021, The Editorial Board of Journal of Xidian University. All right reserved.
引用
收藏
页码:91 / 98
页数:7
相关论文
共 17 条
[1]  
JIA Hairong, WANG Weimei, WANG Yan, Et al., Speech Enhancement Based on Discriminative Joint Sparse Dictionaryalternate Optimization, Journal of Xidian University, 46, 3, pp. 74-81, (2019)
[2]  
YUAN Wenhao, LOU Yingxi, LIANG Chunyan, Et al., Speech Enhancement Method Based on the Perceptual Joint Optimization Deep Neural Network, Journal of Xidian University, 46, 2, pp. 89-94, (2019)
[3]  
MOHAMMADIHA N, SMARAGDIS P, LEIJON A., Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization, IEEE Transactions on Audio, Speech, and Language Processing, 21, 10, pp. 2140-2151, (2013)
[4]  
WANG Y, NARAYANAN A, WANG D L., On Training Targets for Supervised Speech Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 12, pp. 1849-1858, (2014)
[5]  
LI Baoming, FU Xiaoning, Supervised Speech Enhancement Algorithm Based on Phase Spectrum Estimation, Computer Science and Application, 8, 4, pp. 546-552, (2018)
[6]  
WANG Yan, JIA Hairong, JI Huifang, Et al., Feature Joint Optimization of Deep Belief Network for Speech Enhancement, Computer Engineering and Applications, 55, 9, pp. 38-42, (2019)
[7]  
BAO F, ABDULLA W H., Noise Masking Method Based on an Effective Ratio Mask Estimation in Gammatone Channels, APSIPA Transactions on Signal and Information Processing, 7, pp. 1-12, (2018)
[8]  
GUO Xin, JIA Hairong, WANG Dong, Speech Enhancement Using the Improved K-SVD Algorithm by Subspace, Journal of Xidian University, 43, 6, pp. 109-115, (2016)
[9]  
LI R, SUN X, LIU Y, Et al., Multi-resolution Auditory Cepstral Coefficient and Adaptive Mask for Speech Enhancement with Deep Neural Network, Eurasip Journal on Advances in Signal Processing, 2019, 1, (2019)
[10]  
Specification for Normal Equal-loudness Level Contours for Pure Tones Under Free-field Listening Conditions: BS-3383: 1988, (1988)