A vector space modeling approach to spoken language identification

被引：149

作者：

Li, Haizhou ^{[1
]}

Ma, Bin

Lee, Chin-Hui

机构：

[1] Inst Infocomm Res, Singapore 119613, Singapore

[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 01期

关键词：

acoustic segment models (ASMs); artificial neural network (ANN); spoken language identification; support vector machine (SVM); text categorization; vector space model (VSM);

D O I：

10.1109/TASL.2006.876860

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a novel approach to automatic spoken language identification (LID) based on vector space modeling (VSM). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic units, which can be characterized by the acoustic segment models (ASMs). A spoken utterance is then decoded into a sequence of ASM units. The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units. As such, we can build a vector space classifier for LED. The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. We evaluated the proposed VSM framework on 1996 and 2003 NIST Language Recognition Evaluation (LRE) databases, achieving an equal error rate (EER) of 2.75% and 4.02% in the 1996 and 2003 LRE 30-s tasks, respectively, which represents one of the best results reported on these popular tasks.

引用

页码：271 / 284

页数：14

共 53 条

[1]

ADAMI AG, 2003, P 8 EUR C SPEECH COM, P841

[2]

ADDADECKER M, 2003, P ICPHS, P747

[3] How Do Humans Process and Recognize Speech? [J].

Allen, Jont B. .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :567-577

[4]

[Anonymous], P EUROSPEECH 1997

[5]

[Anonymous], P INTERSPEECH

[6] Exploiting latent semantic information in statistical language modeling [J].

Bellegarda, JR .

PROCEEDINGS OF THE IEEE, 2000, 88 (08) :1279-1296

[7]

BERKLING KM, 1994, INT CONF ACOUST SPEE, P289

[8]

BERKLING KM, 1994, P ICSLP 94, P1891

[9]

Campbell J. P, 2004, P OD SPEAK LANG REC, P41

[10]

Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482

← 1 2 3 4 5 6 →