Speaker Localization among multi-faces in noisy environment by audio-visual Integration

被引：10

作者：

Kim, Hyun-Don ^{[1
]}

Choi, Jong-Suk ^{[1
]}

Kim, Munsang ^{[1
]}

机构：

[1] Intelligent Robot Res Ctr, Korea Inst Sci & Technol, Seoul, South Korea

来源：

2006 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-10 | 2006年

关键词：

sound localization; face tracking; voice activity detection; human robot interaction; audiovisual integration;

D O I：

10.1109/ROBOT.2006.1641889

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we not only developed a reliable sound localization system including VAD (Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate these systems in the human-robot interaction to compensate the errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from the undesired directions effectively. For the purpose of verifying our system's performances, we installed the proposed audition and vision system to the prototype robot, called IRORAA (Intelligent ROBot for Active Audition), and showed how to integrate an audio-visual system.

引用

页码：1305 / 1310

页数：6

共 50 条

[41] Audio-visual spatial alignment improves integration in the presence of a competing audio-visual stimulus
Fleming, Justin T.
Noyce, Abigail L.
Shinn-Cunningham, Barbara G.
NEUROPSYCHOLOGIA, 2020, 146
[42] Audio-visual integration in multimodal communication
Chen, T
Rao, RR
PROCEEDINGS OF THE IEEE, 1998, 86 (05) : 837 - 852
[43] Audio-visual integration and saccadic inhibition
Makovac, Elena
Buonocore, Antimo
McIntosh, Robert D.
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2015, 68 (07): : 1295 - 1305
[44] AVA ACTIVE SPEAKER: AN AUDIO-VISUAL DATASET FOR ACTIVE SPEAKER DETECTION
Roth, Joseph
Chaudhuri, Sourish
Klejch, Ondrej
Marvin, Radhika
Gallagher, Andrew
Kaver, Liat
Ramaswamy, Sharadh
Stopczynski, Arkadiusz
Schmid, Cordelia
Xi, Zhonghua
Pantofaru, Caroline
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4492 - 4496
[45] Fusion and combination in audio-visual integration
Omata, Kei
Mogi, Ken
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2008, 464 (2090): : 319 - 340
[46] Audio-visual integration of emotion expression
Collignon, Olivier
Girard, Simon
Gosselin, Frederic
Roy, Sylvain
Saint-Amour, Dave
Lassonde, Maryse
Lepore, Franco
BRAIN RESEARCH, 2008, 1242 : 126 - 135
[47] Audio-visual integration in temporal perception
Wada, Y
Kitagawa, N
Noguchi, K
INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2003, 50 (1-2) : 117 - 124
[48] Noisy audio feature enhancement using audio-visual speech data
Goecke, R
Potamianos, G
Neti, C
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2025 - 2028
[49] Speaker and digit recognition by audio-visual lip biometrics
Faraj, Maycel Isaac
Bigun, Josef
ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1016 - +
[50] Audio-Visual Speech Recognition in the Presence of a Competing Speaker
Shao, Xu
Barker, Jon
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1292 - 1295

← 1 2 3 4 5 →