Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier

被引:5
作者
Lucey, S [1 ]
Sridharan, S [1 ]
Chandran, V [1 ]
机构
[1] Queensland Univ Technol, Sch Elect & Elect Syst Engn, RCSAVT, Speech Res Lab, Brisbane, Qld 4001, Australia
来源
PROCEEDINGS OF 2001 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING | 2001年
关键词
D O I
10.1109/ISIMP.2001.925455
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classifier ore the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion across a number of audio noise levels.
引用
收藏
页码:551 / 554
页数:4
相关论文
共 8 条
  • [1] BREGLER C, 1994, P INT C AC SPEECH SI, P669
  • [2] COX S, 1997, AVSP
  • [3] Fukunaga K., 1990, INTRO STAT PATTERN R
  • [4] Combining classifiers: A theoretical framework
    Kittler, J
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 1998, 1 (01) : 18 - 27
  • [5] Massaro D.W., 1987, Speech perception by ear and eye: A paradigm for psychological inquiry
  • [6] MATTHEWS I, 1998, THESIS U E ANGLIA UK
  • [7] MOVELLAN JR, 1997, 9701 U CAL DEP COGN
  • [8] YOUNG SJ, 1999, HTK BOOK HTK VERSION