Real-time sound source localization and separation based on active audio-visual integration

被引：0

作者：

Okuno, HG ^{[1
]}

Nakadai, K ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan

来源：

COMPUTATIONAL METHODS IN NEURAL MODELING, PT 1 | 2003年 / 2686卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Robot audition in the real world should cope with environment noises and reverberation and motor noises caused by the robot's own movements. This paper presents the active direction-pass filter (ADPF) to separate sounds originating from the specified direction with a pair of microphones. The ADPF is implemented by hierarchical integration of visual and auditory processing with hypothetical reasoning on interaural phase difference (IPD) and interaural intensity difference (IID) for each subband. In creating hypotheses, the reference data of IPD and HD is calculated by the auditory epipolar geometry on demand. Since the performance of the ADPF depends on the direction, the ADPF controls the direction by motor movement. The human tracking and sound source separation based on the ADPF is implemented on an upper-torso humanoid and runs in real-time with 4 PCs connected over Gigabit ethernet. The signal-to-noise ratio (SNR) of each sound separated by the ADPF from a mixture of two speeches with the same loudness is improved to about 10 dB from 0 dB.

引用

页码：118 / 125

页数：8

共 50 条

[1] Real-time speaker localization and speech separation by audio-visual integration
Nakadai, K
Hidai, K
Okuno, HG
Kitano, H
2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 1043 - 1049
[2] Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization
Um, Sung Jin
Kim, Dongjin
Kim, Jung Uk
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3507 - 3516
[3] Active Audio-Visual Separation of Dynamic Sound Sources
Majumder, Sagnik
Grauman, Kristen
COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 551 - 569
[4] Real-time sound source localization based on audiovisual frequency integration
Tsuji, Tokuo
Yamamoto, Kenkichi
Ishii, Idaku
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 322 - +
[5] Real-time source separation based on sound localization in a reverberant environment
Aoki, M
Furuya, K
NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS, 2002, : 475 - 484
[6] AUDIO-VISUAL DISCREPANCY AND THE INFLUENCE ON VERTICAL SOUND SOURCE LOCALIZATION
Werner, Stephan
Liebetrau, Judith
Sporer, Thomas
2012 Fourth International Workshop on Quality of Multimedia Experience (QoMEX), 2012, : 133 - 139
[7] Audio-Visual Fusion for Sound Source Localization and Improved Attention
Lee, Byoung-gi
Choi, JongSuk
Yoon, SangSuk
Choi, Mun-Taek
Kim, Munsang
Kim, Daijin
TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS A, 2011, 35 (07) : 737 - 743
[8] Information-Driven Active Audio-Visual Source Localization
Schult, Niclas
Reineking, Thomas
Kluss, Thorsten
Zetzsche, Christoph
PLOS ONE, 2015, 10 (09):
[9] Visually Guided Sound Source Separation With Audio-Visual Predictive Coding
Song, Zengjie
Zhang, Zhaoxiang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15528 - 15542
[10] Real-Time Sound Source Localization
Mandlik, Michal
Nemec, Zdenek
Dolecek, Radovan
2012 13TH INTERNATIONAL RADAR SYMPOSIUM (IRS), 2012, : 322 - 325

← 1 2 3 4 5 →