Two-Layered Audio-Visual Integration in Voice Activity Detection and Automatic Speech Recognition for Robots

被引:0
作者
Yoshida, Takami [1 ]
Nakadai, Kazuhiro [1 ]
机构
[1] Tokyo Inst Technol, Grad Sch Informat Sci & Engn, Tokyo, Japan
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年
关键词
audio-visual integration; speech recognition; voice activity detection;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic Speech Recognition (ASR) which plays an important role in human-robot interaction should be noise-robust because robots are expected to work in noisy environments. Audio-Visual (AV) integration is one of the key ideas to improve the robustness in such environments. This paper proposes two-layered AV integration for ASR which applies AV integration to Voice Activity Detection (VAD) and ASR decoding process. We implemented a prototype ASR system based on the proposed two-layered AV integration and evaluated the system in dynamically-changing situations where audio and/or visual information is noisy or missing. Preliminary results showed that the proposed method improves the robustness of ASR system even in auditory- or visually-contaminated situations.
引用
收藏
页码:2710 / 2713
页数:4
相关论文
共 12 条
[1]  
Almajai I., 2008, P EUSIPCO
[2]  
Asano F, 2003, FUSION 2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE OF INFORMATION FUSION, VOLS 1 AND 2, P386
[3]   Image understanding for iris biometrics: A survey [J].
Bowyer, Kevin W. ;
Hollingsworth, Karen ;
Flynn, Patrick J. .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (02) :281-307
[4]   A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) [J].
Fiscus, JG .
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :347-354
[5]  
Gravier G, 2002, INT CONF ACOUST SPEE, P853
[6]   Robust speech detection method for telephone speech recognition system [J].
Kuroiwa, S ;
Naito, M ;
Yamamoto, S ;
Higuchi, N .
SPEECH COMMUNICATION, 1999, 27 (02) :135-148
[7]  
Murai K, 2003, IEICE T INF SYST, VE86D, P505
[8]   Design and Implementation of Robot Audition System 'HARK' - Open Source Software for Listening to Three Simultaneous Speakers [J].
Nakadai, Kazuhiro ;
Takahashi, Toru ;
Okuno, Hiroshi G. ;
Nakajima, Hirofumi ;
Hasegawa, Yuji ;
Tsujino, Hiroshi .
ADVANCED ROBOTICS, 2010, 24 (5-6) :739-761
[9]  
OKADA K, 1998, NATO ASI SERIES F
[10]  
Potamianos Gerasimos, 2008, 2008 Hands-Free Speech Communication and Microphone Arrays (HSCMA '08), P119, DOI 10.1109/HSCMA.2008.4538701