Analysis of Distance Measures for Pre-quantization before Feature Extraction in Automatic Speaker Recognition

被引:0
作者
Sarkar, Gourav [1 ]
Saha, Goutam [1 ]
机构
[1] Indian Inst Technol, Dept Elect & Elect Commun Engn, Kharagpur 721302, W Bengal, India
来源
2009 ANNUAL IEEE INDIA CONFERENCE (INDICON 2009) | 2009年
关键词
Correlation; distance measure; log-spectral distance; pre-quantization; speaker recognition; IDENTIFICATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The total recognition time as well as the memory requirement in speaker recognition is mainly governed by the number of speakers, the number of frame vectors in the test sequence and the feature dimensionality. The adjacent frame vectors can show similarity in the feature space because of the slow movements of the articulators. Hence efficient frame selection techniques to select non-redundant frames in the preprocessing stage will be very effective in real time application of this recognition system. In pre-quantization (PQ) we select a new sequence of frames Y from the original frames X such that length of Y is less than X. In this paper we propose different distance measure techniques for selecting frames exploiting the redundancies between consecutive frames. The aim is not only to reduce the number of frames for feature extraction but also to maintain the recognition accuracy reasonably high by selecting suitable frames containing speaker specific information. The techniques are evaluated on two different telephone speech databases, POLYCOST and KING.
引用
收藏
页码:91 / 94
页数:4
相关论文
共 9 条
[1]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[2]  
Cha S.-H., 2007, City, V1, P300
[3]   Real-time speaker identification and verification [J].
Kinnunen, T ;
Karpov, E ;
Fränti, P .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :277-288
[4]   An overview of text-independent speaker recognition: From features to supervectors [J].
Kinnunen, Tomi ;
Li, Haizhou .
SPEECH COMMUNICATION, 2010, 52 (01) :12-40
[5]  
Ong S, 1996, ISSPA 96 - FOURTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, P369
[6]   ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS [J].
REYNOLDS, DA ;
ROSE, RC .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01) :72-83
[7]   Experimental Evaluation of Features for Robust Speaker Identification [J].
Reynolds, Douglas A. .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :639-643
[8]   A WEIGHTED CEPSTRAL DISTANCE MEASURE FOR SPEECH RECOGNITION [J].
TOHKURA, Y .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (10) :1414-1422
[9]   Comparison of linear prediction cepstrum coefficients and Mel-Frequency Cepstrum Coefficients for language identification [J].
Wong, E ;
Sridharan, S .
PROCEEDINGS OF 2001 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2001, :95-98