Speaker Localization among multi-faces in noisy environment by audio-visual Integration

被引:10
作者
Kim, Hyun-Don [1 ]
Choi, Jong-Suk [1 ]
Kim, Munsang [1 ]
机构
[1] Intelligent Robot Res Ctr, Korea Inst Sci & Technol, Seoul, South Korea
来源
2006 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-10 | 2006年
关键词
sound localization; face tracking; voice activity detection; human robot interaction; audiovisual integration;
D O I
10.1109/ROBOT.2006.1641889
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we not only developed a reliable sound localization system including VAD (Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate these systems in the human-robot interaction to compensate the errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from the undesired directions effectively. For the purpose of verifying our system's performances, we installed the proposed audition and vision system to the prototype robot, called IRORAA (Intelligent ROBot for Active Audition), and showed how to integrate an audio-visual system.
引用
收藏
页码:1305 / 1310
页数:6
相关论文
共 12 条
[1]   Cepstrum-based pitch detection using a new statistical V/UV classification algorithm [J].
Ahmadi, S ;
Spanias, AS .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03) :333-338
[2]  
HARA I, 2004, P IEEE RSJ INT C INT, P2404
[3]   Sound localization in reverberant environment based on the model of the precedence effect [J].
Huang, J ;
Ohnishi, N ;
Sugie, N .
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 1997, 46 (04) :842-846
[4]  
Huang J, 1998, IEEE IMTC P, P330, DOI 10.1109/IMTC.1998.679796
[5]  
HUANG J, 1994, P IEEE IMTC INT C IN, P967
[6]  
HUANG J, 1997, P IEEE RSJ INT C INT, P683
[7]  
Hyun-Don Kim, 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), P2411
[8]   Robotic spatial sound localization and its 3-D sound human interface [J].
Jie, H ;
Kume, K ;
Saji, A ;
Nishihashi, M ;
Watanabe, T ;
Martens, WL .
FIRST INTERNATIONAL SYMPOSIUM ON CYBER WORLDS, PROCEEDINGS, 2002, :191-197
[9]  
KOBAYASHI H, 1988, IEEE APCCAS INT C CI, P299
[10]  
NAKADAI K, 2001, P IEEE RSJ INT C INT, P1043