Rapid environment adaptation method based on HMM composition with prior noise GMM and multi-SNR models for noisy speech recognition

被引:0
作者
Ida, M [1 ]
Nakamura, S [1 ]
机构
[1] ATR Spoken Language Translat Res Labs, Kyoto 6190288, Japan
来源
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS | 2004年 / 87卷 / 06期
关键词
environment adaptation; HMM composition; noise model; nonstationary noise; multipath model;
D O I
10.1002/ecjb.20093
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the use of speech recognition systems in a real environment, it is inevitable that surrounding environmental noise is present in the input speech, which degrades recognition performance. It is difficult in most cases to predict the mixing of the noise, and the discrepancy of noise environments between the input signal and the acoustic model is a reason for degradation of recognition performance. Consequently, it is desirable to construct an acoustic model which is robust to the mixing of various kinds of noise. The problem of noise mixture can be divided into two aspects, namely, diversified kinds of noise and diversified values of the SNR. In this paper, HMM composition using weight adaptation of the noise GMM is applied to the first problem, and the multi-SNR path model is applied to the second problem. Performance evaluation is performed for a combination of these two approaches in a speech recognition experiment in a noisy environment, using the travel conversation task and the AURORA2 task. When 1 second of adaptation data is used in the AURORA2 task for SNR = 5 dB, the recognition rate is improved by 53% compared to the baseline HMM. This corresponds to the case in which 10 seconds of adaptation data is used in conventional HMM composition. (C) 2004 Wiley Periodicals, Inc.
引用
收藏
页码:39 / 48
页数:10
相关论文
共 12 条
  • [1] [Anonymous], P ICASSP 97 MUN
  • [2] GALES MJF, 1993, P EUROSPEECH, P837
  • [3] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains
    Gauvain, Jean-Luc
    Lee, Chin-Hui
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02): : 291 - 298
  • [4] Hirsch H.G., ISCA ITRW ASR2000 AU
  • [5] IDA M, 2001, 2001SLP3712 INF PROC, P67
  • [6] IDA M, 2001, SP200192 IEICEJ
  • [7] MAXIMUM-LIKELIHOOD LINEAR-REGRESSION FOR SPEAKER ADAPTATION OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS
    LEGGETTER, CJ
    WOODLAND, PC
    [J]. COMPUTER SPEECH AND LANGUAGE, 1995, 9 (02) : 171 - 185
  • [8] Martin F, 1993, 3 EUR C SPEECH COMM, P1031
  • [9] MINAMI Y, 1995, P ICASSP, P129
  • [10] OKUDA K, 2001, P EUROSPEECH, P1653