A speaker identification-verification approach for noise-corrupted and improved speech using fusion features and a convolutional neural network

被引:2
作者
Nisa R. [1 ]
Baba A.M. [1 ]
机构
[1] Department of Electronics and Communication Engineering, Islamic University of Science and Technology, Jammu & Kashmir, Awantipora
关键词
Convolutional neural network; Feature extraction; Feature fusion; Speaker identification; Speaker verification; Speech enhancement;
D O I
10.1007/s41870-024-01877-z
中图分类号
学科分类号
摘要
The degraded quality of the speech input signal has a negative impact on speaker recognition techniques. We address the issues of speaker recognition from noise-corrupted audio signals in the presence of four noise variants, including factory noise, car noise, street traffic noise, and voice babble noise, as well as noise-suppressed enhanced speech. The goal of this research is to create a speaker recognition algorithm that is resistant to a diverse spectrum of speech capture quality, background scenarios, and interferences. In this work, three distinct features, including Mel Frequency Cepstral Coefficients (MFCC), Normalized Pitch Frequency (NPF), and Normalized Phase Cepstral Coefficients (NPCC) are combined. The analysis that MFCC, NPF, and NPCC illustrate distinct features of speech underlies our observation. A Convolutional Neural Network (CNN) is used in our speaker recognition strategy to learn speaker-dependent attributes from fragments of Mel features, normalized pitch features, and phase cepstral features of clean speech, corrupted speech, and enhanced speech. The performance is measured using the ITU-T test signals and compared to previous algorithms at different Signal-to-Noise-Ratios of 0 dB, 5 dB, 10 dB, and 15 dB. For enhanced speech, all three features, MFCC, NPF, and NPCC, provided productive speaker identification and verification performance. © Bharati Vidyapeeth's Institute of Computer Applications and Management 2024.
引用
收藏
页码:3493 / 3501
页数:8
相关论文
共 35 条
[1]  
Jayanna H.S., Prasanna S.M., Analysis, feature extraction, modeling and testing techniques for speaker recognition, IETE Tech Rev, 26, 3, pp. 181-190, (2009)
[2]  
Singh N., Khan R.A., Shree R., MFCC and prosodic feature extraction techniques: a comparative study, Int J Comput Appl, 54, 1, pp. 9-13, (2012)
[3]  
Hasan M.R., Jamil M., Rabbani M.G., Rahman M.S., Speaker identification using Mel frequency cepstral coefficients, . In: ICECE International Conference on Electrical & Computer Engineering, pp. 565-568, (2004)
[4]  
Krishnamurthy N., Hansen J.H., Babble noise: modeling, analysis, and applications, IEEE Trans Audio Speech Lang Process, 17, 7, pp. 1394-1407, (2009)
[5]  
Yutai W., Bo L., Xiaoqing J., Et al., Speaker recognition based on dynamic MFCC parameters, In: IEEE International Conference on Image Analysis and Signal Processing, April, 2009, pp. 406-409, (2009)
[6]  
Reynolds D.A., Quatieri T.F., Dunn R.B., Speaker verification using adapted Gaussian mixture models, Digit Signal Process, 10, 1-3, pp. 19-41, (2000)
[7]  
Campbell W.M., Campbell J.P., Reynolds D.A., Et al., Support vector machines for speaker and language recognition, Comput Speech Lang, 20, 2-3, pp. 210-229, (2006)
[8]  
Campbell W.M., Sturim D.E., Reynolds D.A., Support vector machines using GMM supervectors for speaker verification, IEEE Signal Process Lett, 13, 5, pp. 308-311, (2006)
[9]  
Dehak N., Dehak R., Glass J.R., Et al., Cosine similarity scoring without score normalization techniques, In: Odyssey, June 2010, (2010)
[10]  
Daqrouq K., Tutunji T.A., Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers, Appl Soft Comput, 27, pp. 231-239, (2015)