Speaker Localization among multi-faces in noisy environment by audio-visual Integration

被引：10

作者：

Kim, Hyun-Don ^{[1
]}

Choi, Jong-Suk ^{[1
]}

Kim, Munsang ^{[1
]}

机构：

[1] Intelligent Robot Res Ctr, Korea Inst Sci & Technol, Seoul, South Korea

来源：

2006 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-10 | 2006年

关键词：

sound localization; face tracking; voice activity detection; human robot interaction; audiovisual integration;

D O I：

10.1109/ROBOT.2006.1641889

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we not only developed a reliable sound localization system including VAD (Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate these systems in the human-robot interaction to compensate the errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from the undesired directions effectively. For the purpose of verifying our system's performances, we installed the proposed audition and vision system to the prototype robot, called IRORAA (Intelligent ROBot for Active Audition), and showed how to integrate an audio-visual system.

引用

页码：1305 / 1310

页数：6

共 50 条

[31] Audio-Visual Feature Fusion for Speaker Identification
Almaadeed, Noor
Aggoun, Amar
Amira, Abbes
NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
[32] Aging and audio-visual and multi-cue integration in motion
Roudaia, Eugenie
Sekuler, Allison B.
Bennett, Patrick J.
Sekuler, Robert
FRONTIERS IN PSYCHOLOGY, 2013, 4
[33] Audio-visual speaker identification based on the use of dynamic audio and visual features
Fox, N
Reilly, RB
AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
[34] Visual limitations shape audio-visual integration
Perez-Bellido, Alexis
Ernst, Marc O.
Soto-Faraco, Salvador
Lopez-Moliner, Joan
JOURNAL OF VISION, 2015, 15 (14):
[35] Audio-visual target speaker enhancement on multi-talker environment using event-driven cameras
Arriandiaga, Ander
Morrone, Giovanni
Pasa, Luca
Badino, Leonardo
Bartolozzi, Chiara
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[36] AUDIO-VISUAL SPEAKER IDENTIFICATION WITH MULTI-VIEW DISTANCE METRIC LEARNING
Zheng, Haomian
Wang, Meng
Li, Zhu
2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 4561 - 4564
[37] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
Tariquzzaman, Md.
Kim, Jin Young
Na, Seung You
Kim, Hyoung-Gook
Har, Dongsoo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
[38] Egocentric Audio-Visual Object Localization
Huang, Chao
Flan, Yapeng
Kurnar, Anurag
Xu, Chenliang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921
[39] Audio-visual integration for speech recognition
Kober, R
Harz, U
NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
[40] The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear
Fecher, Natalie
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2247 - 2250

← 1 2 3 4 5 →