Speaker independent audio-visual continuous speech recognition

被引:0
作者
Liang, LH [1 ]
Liu, XX [1 ]
Zhao, YB [1 ]
Pi, XB [1 ]
Nefian, AV [1 ]
机构
[1] Intel Corp, Microcomp Res Labs, Santa Clara, CA 95052 USA
来源
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS | 2002年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increase in the number of multimedia applications that require robust speech recognition systems determined a large interest in the study of audio-visual speech recognition (AVSR) systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech generation and the need for features that are invariant to acoustic noise perturbation. The speaker independent audio-visual continuous speech recognition system presented in this paper relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region. Further, the visual and acoustic observation sequences are integrated using a coupled hidden Markov (CHMM) model. The statistical properties of the CHMM can model the audio and visual state asynchrony while preserving their natural correlation over time. The experimental results show that the current system tested on the XM2VTS database reduces by over 55% the error rate of the audio only speech recognition system at SNR of 0db.
引用
收藏
页码:A25 / A28
页数:4
相关论文
共 17 条
[1]  
AI HZ, 2001, IEEE INT C ART INT, V2, P603
[2]  
[Anonymous], 1999, The Nature Statist. Learn. Theory
[3]  
BREGLER C, 1995, FIFTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, PROCEEDINGS, P494, DOI 10.1109/ICCV.1995.466899
[4]  
Chen TH, 2001, IEEE SIGNAL PROC MAG, V18, P9
[5]  
Duda R. O., 2000, Pattern Classification and Scene Analysis, V2nd
[6]   Audio-Visual Speech Modeling for Continuous Speech Recognition [J].
Dupont, Stephane ;
Luettin, Juergen .
IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) :141-151
[7]  
Jensen F., 1998, INTRO BAYESIAN NETWO
[8]  
Luettin J, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P58, DOI 10.1109/ICSLP.1996.607024
[9]  
Luettin J, 2001, INT CONF ACOUST SPEE, P169, DOI 10.1109/ICASSP.2001.940794
[10]  
LUETTIN J, 1998, 9805 IDIAPCOM