SHO based Deep Residual network and hierarchical speech features for speech enhancement

被引:0
作者
Bhosle M.R. [1 ,2 ]
Narayaswamy N.K. [2 ,3 ]
机构
[1] Electronics and Communication Engineering, Government Engineering College, Raichur
[2] Visvesvaraya Technological University, Karnataka, Belagavi
[3] Department of ECE, Nagarjuna College of Engineering and Technology, Bangalore
关键词
Bark Frequency Cepstral Coefficients; Deep residual network; Harmony search optimization algorithm; Shuffled Shepherd Optimization Algorithm; Speech enhancement;
D O I
10.1007/s10772-022-09972-x
中图分类号
学科分类号
摘要
The human frequently finds difficulty in understanding the speech due to the real-world noises. The presence of external noises corrupts the listening comfort of user. Hence there is a need for the enhancement of speech. In this paper, the Shepherd Harmony Optimization (SHO)-based Deep Residual network (DRN) is developed for speech enhancement. Here, the developed SHO-based DRN is the combination of the Shuffled Shepherd Optimization Algorithm (SSOA) and Harmony Search optimization (HS). The Hanning window is used for the pre-processing of the input data. In this method, the Bark Frequency Cepstral Coefficients (BFCC) and Fractional Delta amplitude modulation spectrogram (FD-AMS) are used for the feature extraction. Moreover, the noises present in speech signals are predicted for eliminating the distorted noises and external calamities. Besides, the DRN classifier is utilized to improve the speech signal. The classifier is trained by newly devised optimization algorithm. Besides, the developed speech enhancement technique obtained better performance in terms of Perceptual Evaluation of Speech Quality (PESQ) with 2.646 and Root Mean Square Error (RMSE) with 0.0067. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:355 / 370
页数:15
相关论文
共 40 条
[1]  
Adeel A., Gogate M., Hussain A., Contextual deep learning-based audio–visual switching for speech enhancement in real-world environments, Information Fusion, 59, pp. 163-170, (2020)
[2]  
Anita J.S., Abinaya J.S., Impact of supervised classifier on speech emotion recognition, Multimedia Research, 2, 1, pp. 9-16, (2019)
[3]  
Asl L.B., Nezhad V.M., Speech enhancement using particle swarm optimization techniques, Proceedings of International Conference on Measuring Technology and Mechatronics Automation, pp. 441-444, (2010)
[4]  
Bando Y., Mimura M., Itoyama K., Yoshii K., Kawahara T., Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 716-720, (2018)
[5]  
Caliendo M., Lanzara V., Vetri L., Roccella M., Marotta R., Carotenuto M., Russo D., Cerroni F., Precenzano F., Emotional–behavioral disorders in healthy siblings of children with neurodevelopmental disorders, Medicina, 56, 10, (2020)
[6]  
Chen J., Wang Y., Yoho S.E., Wang D., Healy E.W., Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises, The Journal of the Acoustical Society of America, 139, 5, pp. 2604-2612, (2016)
[7]  
Chen L.C., Papandreou G., Kokkinos I., Murphy K., Yuille A.L., Semantic image segmentation with deep convolutional nets and fully connected, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 4, pp. 834-848, (2017)
[8]  
D'Addazio G., Santilli M., Sinjari B., Xhajanka E., Rexhepi I., Mangifesta R., Caputi S., Access to dental care—A survey from dentists, people with disabilities and caregivers, International Journal of Environmental Research and Public Health, 18, 4, (2021)
[9]  
Darekar R.V., Dhande A.P., Emotion recognition from speech signals using DCNN with hybrid GA-GWO algorithm, Multimedia Research, 2, 4, pp. 12-22, (2019)
[10]  
Dauphin Y.N., Fan A., Auli M., Grangier D., Language modeling with gated convolutional networks, Proceeding of International Conference on Machine Learning, pp. 933-941, (2017)