Speaker Localization among multi-faces in noisy environment by audio-visual Integration

被引:10
|
作者
Kim, Hyun-Don [1 ]
Choi, Jong-Suk [1 ]
Kim, Munsang [1 ]
机构
[1] Intelligent Robot Res Ctr, Korea Inst Sci & Technol, Seoul, South Korea
关键词
sound localization; face tracking; voice activity detection; human robot interaction; audiovisual integration;
D O I
10.1109/ROBOT.2006.1641889
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we not only developed a reliable sound localization system including VAD (Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate these systems in the human-robot interaction to compensate the errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from the undesired directions effectively. For the purpose of verifying our system's performances, we installed the proposed audition and vision system to the prototype robot, called IRORAA (Intelligent ROBot for Active Audition), and showed how to integrate an audio-visual system.
引用
收藏
页码:1305 / 1310
页数:6
相关论文
共 50 条
  • [31] Audio-Visual Feature Fusion for Speaker Identification
    Almaadeed, Noor
    Aggoun, Amar
    Amira, Abbes
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
  • [32] Aging and audio-visual and multi-cue integration in motion
    Roudaia, Eugenie
    Sekuler, Allison B.
    Bennett, Patrick J.
    Sekuler, Robert
    FRONTIERS IN PSYCHOLOGY, 2013, 4
  • [33] Audio-visual speaker identification based on the use of dynamic audio and visual features
    Fox, N
    Reilly, RB
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
  • [34] Visual limitations shape audio-visual integration
    Perez-Bellido, Alexis
    Ernst, Marc O.
    Soto-Faraco, Salvador
    Lopez-Moliner, Joan
    JOURNAL OF VISION, 2015, 15 (14):
  • [35] Audio-visual target speaker enhancement on multi-talker environment using event-driven cameras
    Arriandiaga, Ander
    Morrone, Giovanni
    Pasa, Luca
    Badino, Leonardo
    Bartolozzi, Chiara
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [36] AUDIO-VISUAL SPEAKER IDENTIFICATION WITH MULTI-VIEW DISTANCE METRIC LEARNING
    Zheng, Haomian
    Wang, Meng
    Li, Zhu
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 4561 - 4564
  • [37] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
    Tariquzzaman, Md.
    Kim, Jin Young
    Na, Seung You
    Kim, Hyoung-Gook
    Har, Dongsoo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
  • [38] Egocentric Audio-Visual Object Localization
    Huang, Chao
    Flan, Yapeng
    Kurnar, Anurag
    Xu, Chenliang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921
  • [39] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [40] The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear
    Fecher, Natalie
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2247 - 2250