Latent prosody analysis for robust speaker identification

被引:2
作者
Liao, Yuan-Fu [1 ]
Chen, Zi-He
Juang, Yau-Tarng
机构
[1] Natl Taipei Univ Technol, Dept Elect Engn, Taipei 10643, Taiwan
[2] Natl Cent Univ, Dept Elect Engn, Jhongli 32001, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 06期
关键词
latent prosody analysis; latent semantic analysis; probabilistic latent semantic analysis; speaker identification; speaker recognition; speech prosody;
D O I
10.1109/TASL.2007.896660
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Handsets that are not seen in the training phase (unseen handsets) are significant sources of performance degradation for speaker identification (SID) applications in the telecommunication environment. In this paper, a novel latent prosody analysis (LPA) approach to automatically extract the most discriminative prosodic cues for assisting in conventional spectral feature-based SID is proposed. The concept of the LPA approach is to transform the SID problem into a full-text document retrieval-like task via 1) prosodic contour tokenization, 2) latent prosody analysis, and 3) speaker retrieval. Experimental results of the phonetically balanced, read-speech, handset-TIMIT (HTIMIT) database demonstrated that the proposed method of fusing the LPA prosodic feature-based SID systems with maximum-likelihood a priori handset knowledge interpolation (ML-AKI) spectral feature-based SID outperformed both the pitch and energy Gaussian mixture model (Pitch-GMM) and the bigram of the prosodic state (Bigram) counterparts for both cases of counting all and only unseen handsets.
引用
收藏
页码:1870 / 1883
页数:14
相关论文
共 43 条
[1]  
AGAMI AG, 2003, P ICASSP, V4
[2]  
[Anonymous], P SPEAK OD
[3]  
[Anonymous], SNACK SOUND TOOLKIT
[4]  
Baeza-Yates R.A., 1999, Modern Information Retrieval
[5]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[6]  
Carey MJ, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1800, DOI 10.1109/ICSLP.1996.607979
[7]  
CHANG WC, 2006, P ISCSLP, P497
[8]   Prosody dependent speech recognition on radio news corpus of American English [J].
Chen, K ;
Hasegawa-Johnson, M ;
Cohen, A ;
Borys, S ;
Kim, SS ;
Cole, J ;
Choi, JY .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :232-245
[9]  
CHEN KJ, 2002, P COLING 2002, P169
[10]  
Chen ZH, 2005, INT CONF ACOUST SPEE, P185