A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection

被引:0
作者
Tamura, Satoshi [1 ]
Ishikawa, Masato [2 ]
Hashiba, Takashi [2 ]
Takeuchi, Shin'ichi [3 ]
Hayamizu, Satoru [1 ]
机构
[1] Gifu Univ, Fac Engn, Dept Informat Sci, Gifu, Japan
[2] Gifu Univ, Grad Sch Engn, Dept Informat Sci, Gifu, Japan
[3] Gifu Univ, R&D Ctr Human Med Engn, Gifu, Japan
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年
关键词
Audio-visual; speech recognition; voice activity detection; feature fusion; decision fusion;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes a novel speech recognition method combining Audio-Visual Voice Activity Detection (AVVAD) and Audio-Visual Automatic Speech Recognition (AVASR). AVASR has been developed to enhance the robustness of ASR in noisy environments, using visual information in addition to acoustic features. Similarly, AVVAD increases the precision of VAD in noisy conditions, which detects presence of speech from an audio signal. In our approach, AVVAD is conducted as a preprocessing followed by an AVASR system, making a significantly robust speech recognizer. To evaluate the proposed system, recognition experiments were conducted using noisy audio-visual data, testing several AVVAD approaches. Then it is found that the proposed AVASR system using the model-free feature-fusion AVVAD method outperforms not only non-VAD audio-only ASR but also conventional AVASR.
引用
收藏
页码:2702 / +
页数:2
相关论文
共 9 条
[1]  
Almajai I., 2008, P EUSIPCO2008
[2]   Noise robust voice activity detection based on switching Kalman filter [J].
Fujimoto, Masakiyo ;
Ishizuka, Kentaro .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) :467-477
[3]  
Mase K., 1991, T SYSTEMS COMPUTERS, V22, P67
[4]  
Miyajima C., 2000, P ICSLP2000, VII, P1023
[5]  
POTAMIANOS G, 1997, P AVSP 97, P65
[6]   A statistical model-based voice activity detection [J].
Sohn, J ;
Kim, NS ;
Sung, W .
IEEE SIGNAL PROCESSING LETTERS, 1999, 6 (01) :1-3
[7]  
Takeuchi S., 2009, P AVSP2009, P151
[8]  
Tamura S., 2004, P 18 INT C AC ICA 04, V4, P2595
[9]  
TAMURA S, 2002, P IDS02 CLOST IRS GE, P2