Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers

被引:11
作者
Abdalmalak, Kerlos Atia [1 ,2 ]
Gallardo-Antolin, Ascension [2 ]
机构
[1] Aswan Univ, Elect Engn Dept, Aswan 81542, Egypt
[2] Carlos III Univ Madrid, Signal Theory & Commun Dept, Madrid 28911, Spain
关键词
Speaker verification; Speech feature extraction; MFCC; BFCC; PLP; RASTA-PLP; SVM; Logistic regression; Feature combination; Classifier combination; SUPPORT VECTOR MACHINES; AUTOMATIC SPEECH RECOGNITION; WORD RECOGNITION; IDENTIFICATION; TUTORIAL;
D O I
10.1007/s00521-016-2470-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients, their first and second derivatives (Deltas and Delta-Deltas), Bark Frequency Cepstral Coefficients, Perceptual Linear Predictive, and Relative Spectral Transform Perceptual Linear Predictive. In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional support vector machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: linear kernel and Gaussian radial basis function kernel with a logistic regression classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.
引用
收藏
页码:637 / 651
页数:15
相关论文
共 53 条
[41]  
Parveen Shahla, 2000, 6 INT C SPOK LANG PR
[42]   Speaker verification using adapted Gaussian mixture models [J].
Reynolds, DA ;
Quatieri, TF ;
Dunn, RB .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41
[43]   On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification [J].
Sahidullah, Md. ;
Chakroborty, Sandipan ;
Saha, Goutam .
INTERNATIONAL JOURNAL OF BIOMETRICS, 2010, 2 (04) :358-378
[44]  
Sharma Usha, 2015, 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE). Proceedings, P654, DOI 10.1109/ABLAZE.2015.7154944
[45]   A tutorial on support vector regression [J].
Smola, AJ ;
Schölkopf, B .
STATISTICS AND COMPUTING, 2004, 14 (03) :199-222
[46]  
Solera-Ureña R, 2007, LECT NOTES COMPUT SC, V4391, P190
[47]  
Sumithra M.G., 2012, INT C ICCCI, P1
[48]  
Uzan L, 2015, INT CONF BIOMETR, P46, DOI 10.1109/ICB.2015.7139074
[49]   VLSI Design for SVM-Based Speaker Verification System [J].
Wang, Jia-Ching ;
Lian, Li-Xun ;
Lin, Yan-Yu ;
Zhao, Jia-Hao .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2015, 23 (07) :1355-1359
[50]  
Wang YT, 2009, PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND SIGNAL PROCESSING, P406