A vector space modeling approach to spoken language identification

被引:149
作者
Li, Haizhou [1 ]
Ma, Bin
Lee, Chin-Hui
机构
[1] Inst Infocomm Res, Singapore 119613, Singapore
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 01期
关键词
acoustic segment models (ASMs); artificial neural network (ANN); spoken language identification; support vector machine (SVM); text categorization; vector space model (VSM);
D O I
10.1109/TASL.2006.876860
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a novel approach to automatic spoken language identification (LID) based on vector space modeling (VSM). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic units, which can be characterized by the acoustic segment models (ASMs). A spoken utterance is then decoded into a sequence of ASM units. The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units. As such, we can build a vector space classifier for LED. The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. We evaluated the proposed VSM framework on 1996 and 2003 NIST Language Recognition Evaluation (LRE) databases, achieving an equal error rate (EER) of 2.75% and 4.02% in the 1996 and 2003 LRE 30-s tasks, respectively, which represents one of the best results reported on these popular tasks.
引用
收藏
页码:271 / 284
页数:14
相关论文
共 50 条
[21]   Releasing a toolkit and comparing the performance of language embeddings across various spoken language identification datasets [J].
Lindgren, Matias ;
Jauhiainen, Tommi ;
Kurimo, Mikko .
INTERSPEECH 2020, 2020, :467-471
[22]   NOISE-ROBUST SPOKEN LANGUAGE IDENTIFICATION USING LANGUAGE RELEVANCE FACTOR BASED EMBEDDING [J].
Muralikrishna, H. ;
Gupta, Shikha ;
Dinesh, Dileep Aroor ;
Rajan, Padmanabhan .
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, :644-651
[23]   Multi-resolution approach to Identification of spoken languages and to improve overall Language Diarization System using Whisper Model [J].
Vachhani, Bhavik ;
Singh, Dipesh ;
Lawyer, Rustom .
INTERSPEECH 2023, 2023, :1993-1997
[24]   Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Java']Javanese Languages [J].
Safitri, Nur Endah ;
Zahra, Amalia ;
Adriani, Mirna .
SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 :182-187
[25]   Automatic spoken language identification using MFCC based time series features [J].
Mainak Biswas ;
Saif Rahaman ;
Ali Ahmadian ;
Kamalularifin Subari ;
Pawan Kumar Singh .
Multimedia Tools and Applications, 2023, 82 :9565-9595
[26]   Common latent representation learning for low-resourced spoken language identification [J].
Chen, Chen ;
Bu, Yulin ;
Chen, Yong ;
Chen, Deyun .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) :34515-34535
[27]   Automatic spoken language identification using MFCC based time series features [J].
Biswas, Mainak ;
Rahaman, Saif ;
Ahmadian, Ali ;
Subari, Kamalularifin ;
Singh, Pawan Kumar .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) :9565-9595
[28]   UNSUPERVISED NEURAL ADAPTATION MODEL BASED ON OPTIMAL TRANSPORT FOR SPOKEN LANGUAGE IDENTIFICATION [J].
Lu, Xugang ;
Shen, Peng ;
Tsao, Yu ;
Kawai, Hisashi .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :7228-7232
[29]   Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification [J].
Shen, Peng ;
Lu, Xugang ;
Li, Sheng ;
Kawai, Hisashi .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :1813-1817
[30]   Common latent representation learning for low-resourced spoken language identification [J].
Chen Chen ;
Yulin Bu ;
Yong Chen ;
Deyun Chen .
Multimedia Tools and Applications, 2024, 83 :34515-34535