SPEAKER IDENTIFICATION BY AGGREGATING GAUSSIAN MIXTURE MODELS (GMMs) BASED ON UNCORRELATED MFCC-DERIVED FEATURES

被引:4
作者
Pal, Amita [1 ]
Bose, Smarajit [1 ]
Basak, Gopal K. [2 ]
Mukhopadhyay, Amitava [3 ]
机构
[1] Indian Stat Inst, Appl Stat Div, Kolkata, India
[2] Indian Stat Inst, Theoret Stat & Math Div, Kolkata, India
[3] Interra Informat Technol, Kolkata, India
关键词
Mel frequency cepstral coefficients; Gaussian mixture models; principal component transformation; ensemble classification; classification accuracy; NTIMIT; RECOGNITION; CLASSIFIERS; PCA;
D O I
10.1142/S0218001414560060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For solving speaker identification problems, the approach proposed by Reynolds [IEEE Signal Process. Lett. 2 (1995) 46-48], using Gaussian Mixture Models (GMMs) based on Mel Frequency Cepstral Coefficients (MFCCs) as features, is one of the most effective available in the literature. The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and also by the capability of Gaussian mixtures to model arbitrary densities. In this work, we have initially illustrated, with the help of a new bilingual speech corpus, how the well-known principal component transformation, in conjunction with the principle of classifier combination can be used to enhance the performance of the MFCC-GMM speaker recognition systems significantly. Subsequently, we have emphatically and rigorously established the same using the benchmark speech corpus NTIMIT. A significant outcome of this work is that the proposed approach has the potential to enhance the performance of any speaker recognition system based on correlated features.
引用
收藏
页数:25
相关论文
共 30 条
[1]   Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence [J].
Altinçay, H ;
Demirekler, M .
SPEECH COMMUNICATION, 2003, 41 (04) :531-547
[2]  
Babu K. Suri, 2013, INT J APPL INF SYST, V5, P15, DOI DOI 10.5120/IJAIS13-450913
[3]   Subband architecture for automatic speaker recognition [J].
Besacier, L ;
Bonastre, JF .
SIGNAL PROCESSING, 2000, 80 (07) :1245-1259
[4]  
Bose S., 2012, 3 ICSIIT 2012 INT C, P102
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[8]  
Chien J.-T., 2004, INTERSPEECH 2004 ICS, P1785
[9]   Signature extraction using mutual interdependencies [J].
Claussen, Heiko ;
Rosca, Justinian ;
Damper, Robert .
PATTERN RECOGNITION, 2011, 44 (03) :650-661
[10]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366