A vector space modeling approach to spoken language identification

被引:149
|
作者
Li, Haizhou [1 ]
Ma, Bin
Lee, Chin-Hui
机构
[1] Inst Infocomm Res, Singapore 119613, Singapore
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 01期
关键词
acoustic segment models (ASMs); artificial neural network (ANN); spoken language identification; support vector machine (SVM); text categorization; vector space model (VSM);
D O I
10.1109/TASL.2006.876860
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a novel approach to automatic spoken language identification (LID) based on vector space modeling (VSM). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic units, which can be characterized by the acoustic segment models (ASMs). A spoken utterance is then decoded into a sequence of ASM units. The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units. As such, we can build a vector space classifier for LED. The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. We evaluated the proposed VSM framework on 1996 and 2003 NIST Language Recognition Evaluation (LRE) databases, achieving an equal error rate (EER) of 2.75% and 4.02% in the 1996 and 2003 LRE 30-s tasks, respectively, which represents one of the best results reported on these popular tasks.
引用
收藏
页码:271 / 284
页数:14
相关论文
共 50 条
  • [1] Spoken Language Identification Using Score Vector Modeling and Support Vector Machine
    Li, Ming
    Suo, Hongbin
    Wu, Xiao
    Lu, Ping
    Yan, Yonghong
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1649 - 1652
  • [2] An SVD Based Approach for Spoken Language Identification
    Jain, Manish
    Saranya, M. S.
    Murthy, Hema A.
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 312 - 316
  • [3] A Hybrid SVM/MCE Training Approach for Vector Space Topic Identification of Spoken Audio Recordings
    Hazen, Timothy J.
    Richardson, Fred
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2542 - 2545
  • [4] A bayesian logistic regression approach to spoken language identification
    Wang, Haipeng
    Xiao, Xiang
    Zhang, Xiang
    Zhang, Jianping
    Yan, Yonghong
    IEICE ELECTRONICS EXPRESS, 2010, 7 (06): : 390 - 396
  • [5] Identification of Spoken Language using Machine Learning Approach
    Shahriar, Md Asif
    Aziz, Iftekhar
    Banik, Shovan
    Sattar, Abdus
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [6] I-VECTOR BASED LANGUAGE MODELING FOR SPOKEN DOCUMENT RETRIEVAL
    Chen, Kuan-Yu
    Lee, Hung-Shin
    Wang, Hsin-Min
    Chen, Berlin
    Chen, Hsin-Hsi
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Discriminative vector for spoken language recognition
    Ma, Bin
    Tong, Rong
    Li, Haizhou
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1001 - +
  • [8] Acoustic-Support Vector Machines Approach to Detect Spoken Arabic Language
    Eltayeb, Mohammed Osman
    Mustafa, Mohammed Elhafiz
    2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONICS ENGINEERING (ICCEEE), 2013, : 525 - 529
  • [9] Improved Language Identification Using Support Vector Machines for Language Modeling
    Yang, Xi
    Zhai, Lu-Feng
    Siu, Manhung
    Gish, Herbert
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 417 - +
  • [10] High-Resolution Acoustic Modeling and Compact Language Modeling of Language-Universal Speech Attributes for Spoken Language Identification
    Wang, Yannan
    Du, Jun
    Dai, Lirong
    Lee, Chin-Hui
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 992 - 996