Attention to clapping - A direct method for detecting sound source from video and audio

被引:3
作者
Ikeda, T [1 ]
Ishiguro, IE [1 ]
Asada, M [1 ]
机构
[1] Osaka Univ, Grad Sch Engn, Dept Adapt Machine Syst, Suita, Osaka 5650871, Japan
来源
PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS | 2003年
关键词
D O I
10.1109/MFI-2003.2003.1232668
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The research approaches utilizing ubiquitous sensors to support human activities have become of major interest lately. One of the required features of the ubiquitous sensor system is paying its attention to our signals, such as clapping hands and uttering keywords. To detect and localize these signs, it is useful to fuse visual and audio information. The sensor fusion in previous works is performed in the task-level layer through individual representations of the sensors. Therefore, it does not provide new information by fusing sensors. This paper proposes another method that fuses sensory signals based on mutual information maximization in the signal-level layer The fused signal provides us new information that cannot be obtained from individual sensors. As an example, this paper shows two experimental results of a sound source localization by audio-visual fusion.
引用
收藏
页码:264 / 268
页数:5
相关论文
共 50 条
[1]   Audio-Visual Fusion for Sound Source Localization and Improved Attention [J].
Lee, Byoung-gi ;
Choi, JongSuk ;
Yoon, SangSuk ;
Choi, Mun-Taek ;
Kim, Munsang ;
Kim, Daijin .
TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS A, 2011, 35 (07) :737-743
[2]   Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization [J].
Um, Sung Jin ;
Kim, Dongjin ;
Kim, Jung Uk .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :3507-3516
[3]   Localize to Binauralize: Audio Spatialization from Visual Sound Source Localization [J].
Rachavarapu, Kranthi Kumar ;
Aakanksha, Aakanksha ;
Sundaresha, Vignesh ;
Rajagopalan, A. N. .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :1910-1919
[4]   Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol [J].
Apostolidis, Konstantinos ;
Abesser, Jakob ;
Cuccovillo, Luca ;
Mezaris, Vasileios .
PROCEEDINGS OF THE 3RD ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISINFORMATION, MAD 2024, 2024, :30-36
[5]   Detecting semantic concepts from video using temporal gradients and audio classification [J].
Rautiainen, M ;
Seppänen, T ;
Penttilä, J ;
Peltola, J .
IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 :260-270
[6]   Novel Time-Frequency Based Scheme for Detecting Sound Events from Sound Background in Audio Segments [J].
Hajihashemi, Vahid ;
Alavigharahbagh, Abdorreza ;
Oliveira, Hugo S. ;
Cruz, Pedro Miguel ;
Tavares, Joao Manuel R. S. .
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2021, 2021, 12702 :402-416
[7]   An Interferometric Method for Detecting a Moving Sound Source with a Vector-Scalar Receiver [J].
I. V. Kaznacheev ;
G. N. Kuznetsov ;
V. M. Kuz’kin ;
S. A. Pereselkov .
Acoustical Physics, 2018, 64 :37-48
[8]   An Interferometric Method for Detecting a Moving Sound Source with a Vector-Scalar Receiver [J].
Kaznacheev, I. V. ;
Kuznetsov, G. N. ;
Kuz'kin, V. M. ;
Pereselkov, S. A. .
ACOUSTICAL PHYSICS, 2018, 64 (01) :37-48
[9]   A hybrid method of detecting flame from video stream [J].
Dou, Zengfa ;
Ma, Xiaoke ;
Xie, Xianghua ;
Liu, Hui ;
Guo, Chubing .
IET IMAGE PROCESSING, 2022, 16 (11) :2937-2946
[10]   Self-Supervised Sound Promotion Method of Sound Localization from Video [J].
Li, Yang ;
Zhao, Xiaoli ;
Zhang, Zhuoyao .
ELECTRONICS, 2023, 12 (17)