A vector space modeling approach to spoken language identification

被引:149
作者
Li, Haizhou [1 ]
Ma, Bin
Lee, Chin-Hui
机构
[1] Inst Infocomm Res, Singapore 119613, Singapore
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 01期
关键词
acoustic segment models (ASMs); artificial neural network (ANN); spoken language identification; support vector machine (SVM); text categorization; vector space model (VSM);
D O I
10.1109/TASL.2006.876860
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a novel approach to automatic spoken language identification (LID) based on vector space modeling (VSM). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic units, which can be characterized by the acoustic segment models (ASMs). A spoken utterance is then decoded into a sequence of ASM units. The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units. As such, we can build a vector space classifier for LED. The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. We evaluated the proposed VSM framework on 1996 and 2003 NIST Language Recognition Evaluation (LRE) databases, achieving an equal error rate (EER) of 2.75% and 4.02% in the 1996 and 2003 LRE 30-s tasks, respectively, which represents one of the best results reported on these popular tasks.
引用
收藏
页码:271 / 284
页数:14
相关论文
共 50 条
  • [21] Automatic spoken language identification using MFCC based time series features
    Biswas, Mainak
    Rahaman, Saif
    Ahmadian, Ali
    Subari, Kamalularifin
    Singh, Pawan Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 9565 - 9595
  • [22] UNSUPERVISED NEURAL ADAPTATION MODEL BASED ON OPTIMAL TRANSPORT FOR SPOKEN LANGUAGE IDENTIFICATION
    Lu, Xugang
    Shen, Peng
    Tsao, Yu
    Kawai, Hisashi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7228 - 7232
  • [23] Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification
    Shen, Peng
    Lu, Xugang
    Li, Sheng
    Kawai, Hisashi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1813 - 1817
  • [24] Common latent representation learning for low-resourced spoken language identification
    Chen Chen
    Yulin Bu
    Yong Chen
    Deyun Chen
    Multimedia Tools and Applications, 2024, 83 : 34515 - 34535
  • [25] Automatic spoken language identification using MFCC based time series features
    Mainak Biswas
    Saif Rahaman
    Ali Ahmadian
    Kamalularifin Subari
    Pawan Kumar Singh
    Multimedia Tools and Applications, 2023, 82 : 9565 - 9595
  • [26] Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Java']Javanese Languages
    Safitri, Nur Endah
    Zahra, Amalia
    Adriani, Mirna
    SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 182 - 187
  • [27] Common latent representation learning for low-resourced spoken language identification
    Chen, Chen
    Bu, Yulin
    Chen, Yong
    Chen, Deyun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 34515 - 34535
  • [28] Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech
    Valente, Martina
    Brugnara, Fabio
    Morrone, Giovanni
    Zovato, Enrico
    Badino, Leonardo
    INTERSPEECH 2024, 2024, : 1645 - 1649
  • [29] Improving Indian Spoken-Language Identification by Feature Selection in Duration Mismatch Framework
    Bakshi A.
    Kopparapu S.K.
    SN Computer Science, 2021, 2 (6)
  • [30] INTERACTIVE LEARNING OF TEACHER-STUDENT MODEL FOR SHORT UTTERANCE SPOKEN LANGUAGE IDENTIFICATION
    Shen, Peng
    Lu, Xugang
    Li, Sheng
    Kawai, Hisashi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5981 - 5985