An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition

被引：26

作者：

Tsao, Yu ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Grp, Kyoto 6190288, Japan

[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 05期

关键词：

Environment modeling; noise robustness; MAXIMUM-LIKELIHOOD; ADAPTATION; COMPENSATION; NOISE; EIGENVOICE; ALGORITHMS;

D O I：

10.1109/TASL.2009.2016231

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose an ensemble speaker and speaking environment modeling (ESSEM) approach to characterizing environments in order to enhance performance robustness of automatic speech recognition systems under adverse conditions. The ESSEM process comprises two phases, the offline and the online. In the offline phase, we prepare an ensemble speaker and speaking environment space formed by a collection of super-vectors. Each super-vector consists of the entire set of means from all the Gaussian mixture components of a set of hidden Markov models that characterizes a particular environment. In the online phase, with the ensemble environment space prepared in the offline phase, we estimate the super-vector for a new testing environment based on a stochastic matching criterion. In this paper, we focus on methods for enhancing the construction and coverage of the environment space in the offline phase. We first demonstrate environment clustering and partitioning algorithms to structure the environment space well; then, we propose a minimum classification error training algorithm to enhance discrimination across environment super-vectors and therefore broaden the coverage of the ensemble environment space. We evaluate the proposed ESSEM framework on the Aurora2 connected digit recognition task. Experimental results verify that ESSEM provides clear improvement over a baseline system without environmental compensation. Moreover, the performance of ESSEM can be further enhanced by using well-structured environment spaces. Finally, we confirm that ESSEM gives the best overall performance with an environment space refined by an integration of all techniques.

引用

页码：1025 / 1037

页数：13

共 50 条

[1] Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition
Tsao, Yu
Lee, Chin-Hui
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 77 - 80
[2] ENSEMBLE SPEAKER AND SPEAKING ENVIRONMENT MODELING APPROACH WITH ADVANCED ONLINE ESTIMATION PROCESS
Tsao, Yu
Li, Jinyu
Lee, Chin-Hui
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3833 - 3836
[3] A MAP-based Online Estimation Approach to Ensemble Speaker and Speaking Environment Modeling
Tsao, Yu
Matsuda, Shigeki
Hori, Chiori
Kashioka, Hideki
Lee, Chin-Hui
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (02) : 403 - 416
[4] A Vector Space Approach to Environment Modeling for Robust Speech Recognition
Tsao, Yu
Lee, Chin-Hui
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 785 - 788
[5] A LINEAR PROJECTION APPROACH TO ENVIRONMENT MODELING FOR ROBUST SPEECH RECOGNITION
Tsao, Yu
Huang, Chien-Lin
Matsuda, Shigeki
Hori, Chiori
Kashioka, Hideki
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4329 - 4332
[6] An Ensemble Modeling Approach to Joint Characterization of Speaker and Speaking Environments
Tsao, Yu
Lee, Chin-Hui
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2464 - 2467
[7] Improving the Ensemble Speaker and Speaking Environment Modeling Approach by Enhancing the Precision of the Online Estimation Process
Tsao, Yu
Lee, Chin-Hui
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1265 - 1268
[8] An Integrated Approach to Robust Speaker Identification and Speech Recognition
Kwan, C.
Yin, J.
Ayhan, B.
Chu, S.
Liu, X.
Puckett, K.
Zhao, Y.
Ho, K. C.
Kruger, M.
Sityar, I.
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1635 - +
[9] ROBUST SPEECH RECOGNITION THROUGH SELECTION OF SPEAKER AND ENVIRONMENT TRANSFORMS
Bilgi, Raghavendra
Joshi, Vikas
Umesh, S.
Garcia, L.
Benitez, C.
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4333 - 4336
[10] COMBINING EIGENVOICE SPEAKER MODELING AND VTS-BASED ENVIRONMENT COMPENSATION FOR ROBUST SPEECH RECOGNITION
Ou, Zhijian
Deng, Kan
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4673 - 4676

← 1 2 3 4 5 →