Fusion of Contrastive Acoustic Models for Parallel Phonotactic Spoken Language Identification

被引：0

作者：

Sim, Khe Chai ^{[1
]}

Li, Haizhou ^{[1
]}

机构：

[1] Inst Infocomm Res, Singapore, Singapore

来源：

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 | 2007年

关键词：

language identification; precision matrix modelling; acoustic modelling; maximum mutual information;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates combining contrastive acoustic models for parallel phonotactic language identification systems. PRLM, a typical phonotactic system, uses a phone recogniser to extract phonotactic information from the speech data. Combining multiple PRLM systems together forms a Parallel PRLM (PPRLM) system. A standard PPRLM system utilises multiple phone recognisers trained on different languages and phone sets to provide diversification. In this paper, a new approach for PPRLM is proposed where phone recognisers with different acoustic models are used for the parallel systems. The STC and SPAM precision matrix modelling schemes as well as the MMI training criterion are used to produce contrastive acoustic models. Preliminary experimental results are reported on the NIST language recognition evaluation sets. With only two training corpora, a 12-way PPRLM system, using different acoustic modelling schemes, outperformed the standard 2-way PPRLM system by 2.0-5.0% absolute EER.

引用

页码：541 / 544

页数：4

共 8 条

[1]

AXELROD S, 2002, P ICSLP

[2] Semi-tied covariance matrices for hidden Markov models [J].

Gales, MJF .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03) :272-281

[3]

MATEJKA JCP, 2005, P EUR SEPT

[4]

Muthusamy YK, 1992, P INT C SPOK LANG PR, DOI 10.1145/3018009.3018049

[5] Minimum phone error training of precision matrix models [J].

Sim, KC ;

Gales, MJF .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03) :882-889

[6]

TONG R, 2006, IEEE P INT C AC SPEE

[7] Large scale discriminative training of hidden Markov models for speech recognition [J].

Woodland, PC ;

Povey, D .

COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01) :25-47

[8] Comparison of four approaches to automatic language identification of telephone speech [J].

Zissman, MA .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (01) :31-44

← 1 →