An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition

被引:26
|
作者
Tsao, Yu [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Grp, Kyoto 6190288, Japan
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 05期
关键词
Environment modeling; noise robustness; MAXIMUM-LIKELIHOOD; ADAPTATION; COMPENSATION; NOISE; EIGENVOICE; ALGORITHMS;
D O I
10.1109/TASL.2009.2016231
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose an ensemble speaker and speaking environment modeling (ESSEM) approach to characterizing environments in order to enhance performance robustness of automatic speech recognition systems under adverse conditions. The ESSEM process comprises two phases, the offline and the online. In the offline phase, we prepare an ensemble speaker and speaking environment space formed by a collection of super-vectors. Each super-vector consists of the entire set of means from all the Gaussian mixture components of a set of hidden Markov models that characterizes a particular environment. In the online phase, with the ensemble environment space prepared in the offline phase, we estimate the super-vector for a new testing environment based on a stochastic matching criterion. In this paper, we focus on methods for enhancing the construction and coverage of the environment space in the offline phase. We first demonstrate environment clustering and partitioning algorithms to structure the environment space well; then, we propose a minimum classification error training algorithm to enhance discrimination across environment super-vectors and therefore broaden the coverage of the ensemble environment space. We evaluate the proposed ESSEM framework on the Aurora2 connected digit recognition task. Experimental results verify that ESSEM provides clear improvement over a baseline system without environmental compensation. Moreover, the performance of ESSEM can be further enhanced by using well-structured environment spaces. Finally, we confirm that ESSEM gives the best overall performance with an environment space refined by an integration of all techniques.
引用
收藏
页码:1025 / 1037
页数:13
相关论文
共 50 条
  • [1] Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition
    Tsao, Yu
    Lee, Chin-Hui
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 77 - 80
  • [2] ENSEMBLE SPEAKER AND SPEAKING ENVIRONMENT MODELING APPROACH WITH ADVANCED ONLINE ESTIMATION PROCESS
    Tsao, Yu
    Li, Jinyu
    Lee, Chin-Hui
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3833 - 3836
  • [3] A MAP-based Online Estimation Approach to Ensemble Speaker and Speaking Environment Modeling
    Tsao, Yu
    Matsuda, Shigeki
    Hori, Chiori
    Kashioka, Hideki
    Lee, Chin-Hui
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (02) : 403 - 416
  • [4] A Vector Space Approach to Environment Modeling for Robust Speech Recognition
    Tsao, Yu
    Lee, Chin-Hui
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 785 - 788
  • [5] A LINEAR PROJECTION APPROACH TO ENVIRONMENT MODELING FOR ROBUST SPEECH RECOGNITION
    Tsao, Yu
    Huang, Chien-Lin
    Matsuda, Shigeki
    Hori, Chiori
    Kashioka, Hideki
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4329 - 4332
  • [6] An Ensemble Modeling Approach to Joint Characterization of Speaker and Speaking Environments
    Tsao, Yu
    Lee, Chin-Hui
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2464 - 2467
  • [7] Improving the Ensemble Speaker and Speaking Environment Modeling Approach by Enhancing the Precision of the Online Estimation Process
    Tsao, Yu
    Lee, Chin-Hui
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1265 - 1268
  • [8] An Integrated Approach to Robust Speaker Identification and Speech Recognition
    Kwan, C.
    Yin, J.
    Ayhan, B.
    Chu, S.
    Liu, X.
    Puckett, K.
    Zhao, Y.
    Ho, K. C.
    Kruger, M.
    Sityar, I.
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1635 - +
  • [9] ROBUST SPEECH RECOGNITION THROUGH SELECTION OF SPEAKER AND ENVIRONMENT TRANSFORMS
    Bilgi, Raghavendra
    Joshi, Vikas
    Umesh, S.
    Garcia, L.
    Benitez, C.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4333 - 4336
  • [10] COMBINING EIGENVOICE SPEAKER MODELING AND VTS-BASED ENVIRONMENT COMPENSATION FOR ROBUST SPEECH RECOGNITION
    Ou, Zhijian
    Deng, Kan
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4673 - 4676