Two-Layered Audio-Visual Integration in Voice Activity Detection and Automatic Speech Recognition for Robots

被引：0

作者：

Yoshida, Takami ^{[1
]}

Nakadai, Kazuhiro ^{[1
]}

机构：

[1] Tokyo Inst Technol, Grad Sch Informat Sci & Engn, Tokyo, Japan

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年

关键词：

audio-visual integration; speech recognition; voice activity detection;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Automatic Speech Recognition (ASR) which plays an important role in human-robot interaction should be noise-robust because robots are expected to work in noisy environments. Audio-Visual (AV) integration is one of the key ideas to improve the robustness in such environments. This paper proposes two-layered AV integration for ASR which applies AV integration to Voice Activity Detection (VAD) and ASR decoding process. We implemented a prototype ASR system based on the proposed two-layered AV integration and evaluated the system in dynamically-changing situations where audio and/or visual information is noisy or missing. Preliminary results showed that the proposed method improves the robustness of ASR system even in auditory- or visually-contaminated situations.

引用

页码：2710 / 2713

页数：4

共 12 条

[1]

Almajai I., 2008, P EUSIPCO

[2]

Asano F, 2003, FUSION 2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE OF INFORMATION FUSION, VOLS 1 AND 2, P386

[3] Image understanding for iris biometrics: A survey [J].

Bowyer, Kevin W. ;

Hollingsworth, Karen ;

Flynn, Patrick J. .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (02) :281-307

[4] A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) [J].

Fiscus, JG .

1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :347-354

[5]

Gravier G, 2002, INT CONF ACOUST SPEE, P853

[6] Robust speech detection method for telephone speech recognition system [J].

Kuroiwa, S ;

Naito, M ;

Yamamoto, S ;

Higuchi, N .

SPEECH COMMUNICATION, 1999, 27 (02) :135-148

[7]

Murai K, 2003, IEICE T INF SYST, VE86D, P505

[8] Design and Implementation of Robot Audition System 'HARK' - Open Source Software for Listening to Three Simultaneous Speakers [J].

Nakadai, Kazuhiro ;

Takahashi, Toru ;

Okuno, Hiroshi G. ;

Nakajima, Hirofumi ;

Hasegawa, Yuji ;

Tsujino, Hiroshi .

ADVANCED ROBOTICS, 2010, 24 (5-6) :739-761

[9]

OKADA K, 1998, NATO ASI SERIES F

[10]

Potamianos Gerasimos, 2008, 2008 Hands-Free Speech Communication and Microphone Arrays (HSCMA '08), P119, DOI 10.1109/HSCMA.2008.4538701

← 1 2 →