Audio-Visual Fusion for Sound Source Localization and Improved Attention

被引:0
|
作者
Lee, Byoung-gi [1 ]
Choi, JongSuk [1 ]
Yoon, SangSuk [2 ]
Choi, Mun-Taek [2 ]
Kim, Munsang [2 ]
Kim, Daijin [3 ]
机构
[1] Korea Inst Sci & Technol, Ctr Cognit Robot Res, Seoul, South Korea
[2] Korea Inst Sci & Technol, Ctr Intelligent Robot, Seoul, South Korea
[3] Postech, Dept Comp Sci & Engn, Pohang, South Korea
关键词
Audio-Vision Fusion; Sound Source Localization; Human Attention; Robot Tracking;
D O I
10.3795/KSME-A.2011.35.7.737
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. Audiovisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audiovision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.
引用
收藏
页码:737 / 743
页数:7
相关论文
共 50 条
  • [31] Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
    Liu, Tianyu
    Zhang, Peng
    Huang, Wei
    Zha, Yufei
    You, Tao
    Zhang, Yanning
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4042 - 4052
  • [32] Audio-Visual Salieny Network with Audio Attention Module
    Cheng, Shuaiyang
    Gao, Xing
    Song, Liang
    Xiahou, Jianbing
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [33] Egocentric Audio-Visual Object Localization
    Huang, Chao
    Flan, Yapeng
    Kurnar, Anurag
    Xu, Chenliang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921
  • [34] From Sound to Sight: Audio-Visual Fusion and Deep Learning for Drone Detection
    Alla, Ildi
    Olou, Herve B.
    Loscri, Valeria
    Levorato, Marco
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON SECURITY AND PRIVACY IN WIRELESS AND MOBILE NETWORKS, WISEC 2024, 2024, : 123 - 133
  • [35] Fusion and combination in audio-visual integration
    Omata, Kei
    Mogi, Ken
    PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2008, 464 (2090): : 319 - 340
  • [36] Audio-Visual Event Localization by Learning Spatial and Semantic Co-Attention
    Xue, Cheng
    Zhong, Xionghu
    Cai, Minjie
    Chen, Hao
    Wang, Wenwu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 418 - 429
  • [37] Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
    Sterpu, George
    Saam, Christian
    Harte, Naomi
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 111 - 115
  • [38] Audio-visual integration during overt visual attention
    Quigley, Cliodhna
    Onat, Selim
    Harding, Sue
    Cooke, Martin
    Koenig, Peter
    JOURNAL OF EYE MOVEMENT RESEARCH, 2007, 1 (02):
  • [39] BI-DIRECTIONAL MODALITY FUSION NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION
    Liu, Shuo
    Quan, Weize
    Liu, Yuan
    Yan, Dong-Ming
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4868 - 4872
  • [40] Audio-Visual Interactions in Product Sound Design
    Ozcan, Elif
    van Egmond, Rene
    HUMAN VISION AND ELECTRONIC IMAGING XV, 2010, 7527