A vector space modeling approach to spoken language identification

被引：149

作者：

Li, Haizhou ^{[1
]}

Ma, Bin

Lee, Chin-Hui

机构：

[1] Inst Infocomm Res, Singapore 119613, Singapore

[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 01期

关键词：

acoustic segment models (ASMs); artificial neural network (ANN); spoken language identification; support vector machine (SVM); text categorization; vector space model (VSM);

D O I：

10.1109/TASL.2006.876860

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a novel approach to automatic spoken language identification (LID) based on vector space modeling (VSM). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic units, which can be characterized by the acoustic segment models (ASMs). A spoken utterance is then decoded into a sequence of ASM units. The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units. As such, we can build a vector space classifier for LED. The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. We evaluated the proposed VSM framework on 1996 and 2003 NIST Language Recognition Evaluation (LRE) databases, achieving an equal error rate (EER) of 2.75% and 4.02% in the 1996 and 2003 LRE 30-s tasks, respectively, which represents one of the best results reported on these popular tasks.

引用

页码：271 / 284

页数：14

共 50 条

[41] Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification [J].

Lu, Xugang ;

Shen, Peng ;

Tsao, Yu ;

Kawai, Hisashi .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3216-3220

[42] Spoken Language Identification Using Rhythmic Categorization: Syllable-Timed and Stress-Timed [J].

Dey, Spandan ;

Saha, Goutam .

2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,

[43] DURATION-NORMALIZED FEATURE SELECTION FOR INDIAN SPOKEN LANGUAGE IDENTIFICATION IN UTTERANCE LENGTH MISMATCH [J].

Bakshi, Aarti M. ;

Kopparapu, Sunil K. .

JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2022, 17 (03) :2120-2134

[44] A support vector machine approach for identification of pleural effusion [J].

Widodo, Catur Edi ;

Adi, Kusworo ;

Gernowo, Rahmad .

HELIYON, 2024, 10 (01)

[45] Spoken language identification in unseen channel conditions using modified within-sample similarity loss [J].

Muralikrishna, H. ;

Dinesh, Dileep Aroor .

PATTERN RECOGNITION LETTERS, 2022, 158 :16-23

[46] Spoken Language Identification in Unseen Target Domain Using Centroid Similarity Loss With Adaptive Gradient Blending [J].

Muralikrishna, H. ;

Kumar, Sujeet ;

Dinesh, Dileep Aroor ;

Thenkanidiyoor, Veena .

IEEE ACCESS, 2024, 12 :95959-95971

[47] Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages [J].

Abdullah, Badr M. ;

Avgustinova, Tania ;

Moebius, Bernd ;

Klakow, Dietrich .

INTERSPEECH 2020, 2020, :477-481

[48] Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech [J].

Gupta, Shashi Kant ;

Hiray, Sushant ;

Kukde, Prashant .

INTERSPEECH 2023, 2023, :4114-4118

[49] A method of fingerprint identification based on space invariant transforms and support vector machines [J].

He, Y ;

Ou, ZY ;

Guo, R .

2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :1322-1327

[50] Support vector machine for identification of handwritten Gujarati alphabets using hybrid feature space [J].

Apurva A. Desai .

CSI Transactions on ICT, 2015, 2 (4) :235-241

← 1 2 3 4 5 →