Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution

被引：0

作者：

Watanabe, Shinji ^{[1
]}

Nakamura, Atsushi ^{[1
]}

Juang, Biing-Hwang ^{[2
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Tokyo, Japan

[2] Georgia Inst Technol, Ctr Signal & Image Proc, Atlanta, GA 30332 USA

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

speech recognition; incremental adaptation; multiscale; time evolution system;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The change in speech characteristics is originated from various factors, at various (temporal) rates in a real world conversation. These temporal changes have their own dynamics and therefore, we propose to extend the single (time-) incremental adaptations to a multiscale adaptation, which has the potential of greatly increasing the model's robustness as it will include adaptation mechanism to approximate the nature of the characteristic change. The formulation of the incremental adaptation assumes a time evolution system of the model, where the posterior distributions, used in the decision process, are successively updated based on a macroscopic time scale in accordance with the Kalman filter theory. In this paper, we extend the original incremental adaptation scheme, based on a single time scale, to multiple time scales, and apply the method to the adaptation of both the acoustic model and the language model. We further investigate methods to integrate the multi-scale adaptation scheme to realize the robust speech recognition performance. Large vocabulary continuous speech recognition experiments for English and Japanese lectures revealed the importance of modeling multiscale properties in speech recognition.

引用

页码：1088 / +

页数：2

共 14 条

[1] Fujimoto M, 2007, INT CONF ACOUST SPEE, P797
[2] Glass J. R., 2007, INTERSPEECH 2007, P2553
[3] Hoffmeister B., 2006, P INT 06
[4] Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition
Hori, Takaaki
Hori, Chiori
Minami, Yasuhiro
Nakamura, Atsushi
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1352 - 1365
[5] Online adaptive learning of continuous-density hidden markov models based on multiple-stream prior evolution and posterior pooling
Huo, Q
Ma, B
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 388 - 398
[6] Iwata Tomoharu, 2010, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P663
[7] Kannan A., 1997, EUROSPEECH 97, V4, P1863
[8] Maekawa K., 2000, P LREC, V6, P1
[9] DISCRIMINATIVE TRAINING BASED ON AN INTEGRATED VIEW OF MPE AND MMI IN MARGIN AND ERROR SPACE
McDermott, Erik
Watanabe, Shinji
Nakamura, Atsushi
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4894 - 4897
[10] Pushing the envelope -: Aside
Morgan, N
Zhu, QF
Stolcke, A
Sönmez, K
Sivadas, S
Shinozaki, T
Ostendorf, M
Jain, P
Hermansky, H
Ellis, D
Doddington, G
Chen, B
Çetin, Ö
Bourlard, H
Athineos, M
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) : 81 - 88

← 1 2 →