Towards improving the performance of speaker recognition systems

被引:0
作者
Johnson, Neethu [1 ]
George, Kuruvachan K. [2 ]
Kumar, Santhosh C. [2 ]
Raj, Reghu P. C. [1 ]
机构
[1] Govt Engn Coll, Dept Comp Sci & Engn, Palakkad, Kerala, India
[2] Amrita Vishwa Vidyapeetham, Dept Elect & Commun Engn, Machine Intelligence Res Lab, Coimbatore, Tamil Nadu, India
来源
2014 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND COMMUNICATIONS (ICCSC) | 2014年
关键词
Speaker recognition; spectral matching based VAD; total variability; i-vector; WCCN; cosine scoring; VERIFICATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the contribution of different phones in speech data towards improving the performance of text/language independent speaker recognition systems. This work is motivated by the fact that the removal of silence segments from the speech data improves the system performance significantly as it does not contain any speaker-specific information. It is also clear from the literature that not all the phones in the speech data contains equal amount of speaker-specific information in it and the performance of the speaker recognition systems depends on this information. In addition to the silence segments, our work empirically finds 18 other diluent phones that has minimum speaker discrimination capability. We propose to use a preprocessing stage that identifies all non-informative set of phones recursively and removes them along with silence segments. Results show that using phones removed preprocessed data in state-of-the-art i-vector system outperforms the baseline i-vector system. We report absolute improvements of 1%, 1%, 2%, 2% and 1% in EER for test set collected through channels of Digital Voice Recorder, Headset, Mobile Phone 1, Mobile Phone 2 and Tablet PC respectively on IITG-MV database.
引用
收藏
页码:38 / 41
页数:4
相关论文
共 11 条
[1]  
[Anonymous], 2008, THESIS
[2]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[3]  
Dehak N, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P1527
[4]  
Haris B.C., 2011, Proc. IEEE Commun. NCC, P1
[5]   Selecting Feature Frames for Automatic Speaker Recognition Using Mutual Information [J].
Jung, Chi-Sang ;
Kim, Moo Young ;
Kang, Hong-Goo .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06) :1332-1340
[6]   Eigenvoice modeling with sparse training data [J].
Kenny, P ;
Boulianne, G ;
Dumouchel, P .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03) :345-354
[7]   A study of interspeaker variability in speaker verification [J].
Kenny, Patrick ;
Ouellet, Pierre ;
Dehak, Najim ;
Gupta, Vishwa ;
Dumouchel, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (05) :980-988
[8]  
Larcher A, 2013, INTERSPEECH, P2767
[9]   Speaker verification in sensor and acoustic environment mismatch conditions [J].
Pradhan, G. ;
Haris, B.C. ;
Prasanna, S.R.M. ;
Sinha, R. .
International Journal of Speech Technology, 2012, 15 (03) :381-392
[10]   Speaker verification using adapted Gaussian mixture models [J].
Reynolds, DA ;
Quatieri, TF ;
Dunn, RB .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41