Latent prosody analysis for robust speaker identification

被引：2

作者：

Liao, Yuan-Fu ^{[1
]}

Chen, Zi-He

Juang, Yau-Tarng

机构：

[1] Natl Taipei Univ Technol, Dept Elect Engn, Taipei 10643, Taiwan

[2] Natl Cent Univ, Dept Elect Engn, Jhongli 32001, Taiwan

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 06期

关键词：

latent prosody analysis; latent semantic analysis; probabilistic latent semantic analysis; speaker identification; speaker recognition; speech prosody;

D O I：

10.1109/TASL.2007.896660

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Handsets that are not seen in the training phase (unseen handsets) are significant sources of performance degradation for speaker identification (SID) applications in the telecommunication environment. In this paper, a novel latent prosody analysis (LPA) approach to automatically extract the most discriminative prosodic cues for assisting in conventional spectral feature-based SID is proposed. The concept of the LPA approach is to transform the SID problem into a full-text document retrieval-like task via 1) prosodic contour tokenization, 2) latent prosody analysis, and 3) speaker retrieval. Experimental results of the phonetically balanced, read-speech, handset-TIMIT (HTIMIT) database demonstrated that the proposed method of fusing the LPA prosodic feature-based SID systems with maximum-likelihood a priori handset knowledge interpolation (ML-AKI) spectral feature-based SID outperformed both the pitch and energy Gaussian mixture model (Pitch-GMM) and the bigram of the prosodic state (Bigram) counterparts for both cases of counting all and only unseen handsets.

引用

页码：1870 / 1883

页数：14

共 43 条

[1]

AGAMI AG, 2003, P ICASSP, V4

[2]

[Anonymous], P SPEAK OD

[3]

[Anonymous], SNACK SOUND TOOLKIT

[4]

Baeza-Yates R.A., 1999, Modern Information Retrieval

[5] Speaker recognition: A tutorial [J].

Campbell, JP .

PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462

[6]

Carey MJ, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1800, DOI 10.1109/ICSLP.1996.607979

[7]

CHANG WC, 2006, P ISCSLP, P497

[8] Prosody dependent speech recognition on radio news corpus of American English [J].

Chen, K ;

Hasegawa-Johnson, M ;

Cohen, A ;

Borys, S ;

Kim, SS ;

Cole, J ;

Choi, JY .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :232-245

[9]

CHEN KJ, 2002, P COLING 2002, P169

[10]

Chen ZH, 2005, INT CONF ACOUST SPEE, P185

← 1 2 3 4 5 →