Dynamic Bayesian Networks for Audio-Visual Speech Recognition

被引:0
|
作者
Ara V. Nefian
Luhong Liang
Xiaobo Pi
Xiaoxing Liu
Kevin Murphy
机构
[1] Microprocessor Research Labs,Intel Corporation
[2] Microcomputer Research Labs,Intel Corporation
[3] University of California,Computer Science Division
来源
EURASIP Journal on Advances in Signal Processing | / 2002卷
关键词
audio-visual speech recognition; hidden Markov models; coupled hidden Markov models; factorial hidden Markov models; dynamic Bayesian networks;
D O I
暂无
中图分类号
学科分类号
摘要
The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.
引用
收藏
相关论文
共 50 条
  • [1] Dynamic Bayesian networks for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Liu, XX
    Murphy, K
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1274 - 1288
  • [2] Dynamic Bayesian Networks for audio-visual speaker recognition
    Li, DD
    Yang, YC
    Wu, ZH
    ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 539 - 545
  • [3] On Dynamic Stream Weighting for Audio-Visual Speech Recognition
    Estellers, Virginia
    Gurban, Mihai
    Thiran, Jean-Philippe
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1145 - 1157
  • [4] A Phone-Viseme Dynamic Bayesian Network for Audio-Visual Automatic Speech Recognition
    Terry, Louis
    Katsaggelos, Aggelos K.
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2597 - 2600
  • [5] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [6] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [7] Dynamic stream weight modeling for audio-visual speech recognition
    Marcheret, Etienne
    Libal, Vit
    Potamianos, Gerasimos
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 945 - +
  • [8] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [9] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [10] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069