Feature Extraction Methods for Speaker Recognition: A Review

被引:17
作者
Chaudhary, Gopal [1 ]
Srivastava, Smriti [1 ]
Bhardwaj, Saurabh [2 ]
机构
[1] Univ Delhi, Netaji Subhas Inst Technol, Div Instrumentat & Control Engn, New Delhi, India
[2] Thapar Univ, Patiala, Punjab, India
关键词
Speaker recognition; feature extraction; additive noise; ROBUST SPEECH RECOGNITION; POSTERIORI LINEAR-REGRESSION; AUTOMATIC SPEAKER; CEPSTRAL COEFFICIENTS; PHASE SPECTRUM; GLOTTAL SOURCE; WAVE-FORM; NOISY; FREQUENCY; ADAPTATION;
D O I
10.1142/S0218001417500410
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents main paradigms of research for feature extraction methods to further augment the state of art in speaker recognition (SR) which has been recognized extensively in person identification for security and protection applications. Speaker recognition system (SRS) has become a widely researched topic for the last many decades. The basic concept of feature extraction methods is derived from the biological model of human auditory/vocal tract system. This work provides a classification-oriented review of feature extraction methods for SR over the last 55 years that are proven to be successful and have become the new stone to further research. Broadly, the review work is dichotomized into feature extraction methods with and without noise compensation techniques. Feature extraction methods without noise compensation techniques are divided into following categories: On the basis of high/ low level of feature extraction; type of transform; speech production/auditory system; type of feature extraction technique; time variability; speech processing techniques. Further, feature extraction methods with noise compensation techniques are classified into noise-screened features, feature normalization methods, feature compensation methods. This classification-oriented review would endow the clear vision of readers to choose among di r erent techniques and will be helpful in future research in this field.
引用
收藏
页数:39
相关论文
共 157 条
[1]  
Acero A., 1993, Acoustical and Environmental Robustness in Automatic Speech Recognition
[2]  
Alam MJ, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P2072
[3]  
Alam MJ, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P249
[4]   Multitaper MFCC and PLP features for speaker verification using i-vectors [J].
Alam, Md Jahangir ;
Kinnunen, Tomi ;
Kenny, Patrick ;
Ouellet, Pierre ;
O'Shaughnessy, Douglas .
SPEECH COMMUNICATION, 2013, 55 (02) :237-251
[5]   Robust auditory-based speech processing using the average localized synchrony detection [J].
Ali, AMA ;
Van der Spiegel, J ;
Mueller, P .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (05) :279-292
[6]   Short-time phase spectrum in speech processing: A review and some experimental results [J].
Alsteris, Leigh D. ;
Paliwal, Kuldip K. .
DIGITAL SIGNAL PROCESSING, 2007, 17 (03) :578-616
[7]  
[Anonymous], 2002, 7 INT C SPOK LANG PR
[8]  
[Anonymous], 2003, P 8 EUR C SPEECH COM
[9]  
[Anonymous], 1993, Fundamentals of speech recognition
[10]  
[Anonymous], 1996, THESIS