Deep neural network based speech enhancement using mono channel mask

被引:0
作者
Pallavi P. Ingale
Sanjay L. Nalbalwar
机构
[1] Dr. Babasaheb Ambedkar Tecnhological University,
来源
International Journal of Speech Technology | 2019年 / 22卷
关键词
Speech enhancement; Mono channel mask; Binary mask; Modified sub-harmonic summation;
D O I
暂无
中图分类号
学科分类号
摘要
Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time–frequency (T–F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.
引用
收藏
页码:841 / 850
页数:9
相关论文
共 42 条
[1]  
Barfuss H(2017)Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments Computer Speech & Language 46 388-400
[2]  
Huemmer C(2016)Speech enhancement using maximum a-posteriori and gaussian mixture models for speech and noise periodogram estimation Computer Speech & Language 36 58-71
[3]  
Schwarz A(2017)Features for masking-based monaural speech separation in reverberant conditions IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 1085-1094
[4]  
Kellermann W(1984)Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator IEEE Transactions on Acoustics, Speech, and Signal Processing 32 1109-1121
[5]  
Chehrehsa S(2012)A classification based approach to speech segregation The Journal of the Acoustical Society of America 132 3475-3483
[6]  
Moir TJ(2004)A modified a priori snr for speech enhancement using spectral subtraction rules IEEE Signal Processing Letters 11 450-453
[7]  
Delfarah M(2007)Auditory segmentation based on onset and offset analysis IEEE Transactions on Audio, Speech, and Language Processing 15 396-405
[8]  
Wang D(2018)Singing voice separation using mono-channel mask International Journal of Speech Technology 21 309-318
[9]  
Ephraim Y(2015)Speech enhancement based on student t modeling of teager energy operated perceptual wavelet packet coefficients and a custom thresholding function IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23 1800-1811
[10]  
Malah D(2018)Dnn-based monaural speech enhancement with temporal and spectral variations equalization Digital Signal Processing 74 102-110