Audio-Visual Fusion for Sound Source Localization and Improved Attention

被引:0
作者
Lee, Byoung-gi [1 ]
Choi, JongSuk [1 ]
Yoon, SangSuk [2 ]
Choi, Mun-Taek [2 ]
Kim, Munsang [2 ]
Kim, Daijin [3 ]
机构
[1] Korea Inst Sci & Technol, Ctr Cognit Robot Res, Seoul, South Korea
[2] Korea Inst Sci & Technol, Ctr Intelligent Robot, Seoul, South Korea
[3] Postech, Dept Comp Sci & Engn, Pohang, South Korea
关键词
Audio-Vision Fusion; Sound Source Localization; Human Attention; Robot Tracking;
D O I
10.3795/KSME-A.2011.35.7.737
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. Audiovisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audiovision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.
引用
收藏
页码:737 / 743
页数:7
相关论文
共 9 条
[1]  
Byoung-gi Lee, 2010, Proceedings 2010 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO 2010), P176, DOI 10.1109/ARSO.2010.5679699
[2]  
Chan V., 2009, INE NEWSLETTER 0608
[3]   Face detection with the modified census transform [J].
Fröba, B ;
Ernst, A .
SIXTH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, PROCEEDINGS, 2004, :91-96
[4]  
HAAS H, 1972, J AUDIO ENG SOC, V20, P146
[5]   Sound localization for humanoid robots -: Building audio-motor maps based on the HRTF [J].
Hornstein, Jonas ;
Lopes, Manuel ;
Santos-Victor, Jose ;
Lacerda, Francisco .
2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, :1170-1176
[6]  
Jun B, 2007, LECT NOTES COMPUT SC, V4642, P29
[7]   Speaker Selection and Tracking in a Cluttered Environment with Audio and Visual Information [J].
Lim, Yoonseob ;
Choi, Jongsuk .
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (03) :1581-1589
[8]  
Nakadai K., 2001, P EUROSPEECH, P1193
[9]  
ZABIH R, 1994, P EUR C COMP VIS, P151