Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction

被引:0
|
作者
Wei-Wei Liu
Wei-Qiang Zhang
Michael T Johnson
Jia Liu
机构
[1] Tsinghua University,Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering
[2] General Logistics Department,General Communication Station
[3] Marquette University,Department of Electrical and Computer Engineering
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2014卷
关键词
Phonotactic language recognition; Support vector machine (SVM) supervector reconstruction; Phone recognition-vector space modeling (PR-VSM);
D O I
暂无
中图分类号
学科分类号
摘要
Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions.
引用
收藏
相关论文
共 50 条
  • [1] Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction
    Liu, Wei-Wei
    Zhang, Wei-Qiang
    Johnson, Michael T.
    Liu, Jia
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014, : 1 - 13
  • [2] IMPROVED PHONOTACTIC LANGUAGE RECOGNITION BASED ON RNN FEATURE RECONSTRUCTION
    Liu, Wei-Wei
    Zhang, Wei-Qiang
    Shi, Yongzhe
    Ji, An
    Xu, Jiaming
    Liu, Jia
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] EMOTIONAL SPEECH RECOGNITION BASED ON SVM WITH GMM SUPERVECTOR
    Chen Yanxiang Xie Jian (Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine
    Journal of Electronics(China), 2012, (Z2) : 339 - 344
  • [4] EMOTIONAL SPEECH RECOGNITION BASED ON SVM WITH GMM SUPERVECTOR
    Chen Yanxiang Xie Jian Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine School of Computer Science Information Hefei University of Technology Hefei China
    JournalofElectronics(China), 2012, 29(Z2) (China) : 339 - 344
  • [5] GMM supervector based SVM with spectral features for speech emotion recognition
    Hu, Hao
    Xu, Ming-Xing
    Wu, Wei
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 413 - +
  • [6] A GMM SUPERVECTOR KERNEL WITH THE BHATTACHARYYA DISTANCE FOR SVM BASED SPEAKER RECOGNITION
    You, Chang Huai
    Lee, Kong Aik
    Li, Haizhou
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4221 - 4224
  • [7] Advances in Phonotactic Language Recognition
    Glembek, Ondrej
    Matejka, Pavel
    Burget, Lukas
    Mikolov, Tomas
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 743 - 746
  • [8] An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition
    You, Chang Huai
    Lee, Kong Aik
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2009, 16 (1-3) : 49 - 52
  • [9] Dialect Recognition Using a Phone-GMM-Supervector-Based SVM Kernel
    Biadsy, Fadi
    Hirschberg, Julia
    Collins, Michael
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 753 - +
  • [10] Using a Kind of Novel Phonotactic Information for SVM Based Speaker Recognition
    Zhang, Xiang
    Suo, Hongbin
    Zhao, Qingwei
    Yan, Yonghong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (04) : 746 - 749