Dynamic Bayesian Networks for Audio-Visual Speech Recognition

被引：0

作者：

Ara V. Nefian

Luhong Liang

Xiaobo Pi

Xiaoxing Liu

Kevin Murphy

机构：

[1] Microprocessor Research Labs,Intel Corporation

[2] Microcomputer Research Labs,Intel Corporation

[3] University of California,Computer Science Division

来源：

EURASIP Journal on Advances in Signal Processing | / 2002卷

关键词：

audio-visual speech recognition; hidden Markov models; coupled hidden Markov models; factorial hidden Markov models; dynamic Bayesian networks;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

引用

共 50 条

[1] Dynamic Bayesian networks for audio-visual speech recognition
Nefian, AV
Liang, LH
Pi, XB
Liu, XX
Murphy, K
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1274 - 1288
[2] Dynamic Bayesian Networks for audio-visual speaker recognition
Li, DD
Yang, YC
Wu, ZH
ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 539 - 545
[3] On Dynamic Stream Weighting for Audio-Visual Speech Recognition
Estellers, Virginia
Gurban, Mihai
Thiran, Jean-Philippe
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1145 - 1157
[4] A Phone-Viseme Dynamic Bayesian Network for Audio-Visual Automatic Speech Recognition
Terry, Louis
Katsaggelos, Aggelos K.
19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2597 - 2600
[5] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
APPLIED ACOUSTICS, 2023, 211
[6] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[7] Dynamic stream weight modeling for audio-visual speech recognition
Marcheret, Etienne
Libal, Vit
Potamianos, Gerasimos
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 945 - +
[8] Deep Audio-Visual Speech Recognition
Afouras, Triantafyllos
Chung, Joon Son
Senior, Andrew
Vinyals, Oriol
Zisserman, Andrew
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
[9] Audio-visual integration for speech recognition
Kober, R
Harz, U
NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
[10] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
Estellers, Virginia
Thiran, Jean-Philippe
19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069

← 1 2 3 4 5 →