Speaker Localization among multi-faces in noisy environment by audio-visual Integration

被引:10
|
作者
Kim, Hyun-Don [1 ]
Choi, Jong-Suk [1 ]
Kim, Munsang [1 ]
机构
[1] Intelligent Robot Res Ctr, Korea Inst Sci & Technol, Seoul, South Korea
关键词
sound localization; face tracking; voice activity detection; human robot interaction; audiovisual integration;
D O I
10.1109/ROBOT.2006.1641889
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we not only developed a reliable sound localization system including VAD (Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate these systems in the human-robot interaction to compensate the errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from the undesired directions effectively. For the purpose of verifying our system's performances, we installed the proposed audition and vision system to the prototype robot, called IRORAA (Intelligent ROBot for Active Audition), and showed how to integrate an audio-visual system.
引用
收藏
页码:1305 / 1310
页数:6
相关论文
共 50 条
  • [41] Audio-visual spatial alignment improves integration in the presence of a competing audio-visual stimulus
    Fleming, Justin T.
    Noyce, Abigail L.
    Shinn-Cunningham, Barbara G.
    NEUROPSYCHOLOGIA, 2020, 146
  • [42] Audio-visual integration in multimodal communication
    Chen, T
    Rao, RR
    PROCEEDINGS OF THE IEEE, 1998, 86 (05) : 837 - 852
  • [43] Audio-visual integration and saccadic inhibition
    Makovac, Elena
    Buonocore, Antimo
    McIntosh, Robert D.
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2015, 68 (07): : 1295 - 1305
  • [44] AVA ACTIVE SPEAKER: AN AUDIO-VISUAL DATASET FOR ACTIVE SPEAKER DETECTION
    Roth, Joseph
    Chaudhuri, Sourish
    Klejch, Ondrej
    Marvin, Radhika
    Gallagher, Andrew
    Kaver, Liat
    Ramaswamy, Sharadh
    Stopczynski, Arkadiusz
    Schmid, Cordelia
    Xi, Zhonghua
    Pantofaru, Caroline
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4492 - 4496
  • [45] Fusion and combination in audio-visual integration
    Omata, Kei
    Mogi, Ken
    PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2008, 464 (2090): : 319 - 340
  • [46] Audio-visual integration of emotion expression
    Collignon, Olivier
    Girard, Simon
    Gosselin, Frederic
    Roy, Sylvain
    Saint-Amour, Dave
    Lassonde, Maryse
    Lepore, Franco
    BRAIN RESEARCH, 2008, 1242 : 126 - 135
  • [47] Audio-visual integration in temporal perception
    Wada, Y
    Kitagawa, N
    Noguchi, K
    INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2003, 50 (1-2) : 117 - 124
  • [48] Noisy audio feature enhancement using audio-visual speech data
    Goecke, R
    Potamianos, G
    Neti, C
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2025 - 2028
  • [49] Speaker and digit recognition by audio-visual lip biometrics
    Faraj, Maycel Isaac
    Bigun, Josef
    ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1016 - +
  • [50] Audio-Visual Speech Recognition in the Presence of a Competing Speaker
    Shao, Xu
    Barker, Jon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1292 - 1295