Real-time sound source localization and separation based on active audio-visual integration

被引:0
|
作者
Okuno, HG [1 ]
Nakadai, K [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
来源
COMPUTATIONAL METHODS IN NEURAL MODELING, PT 1 | 2003年 / 2686卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robot audition in the real world should cope with environment noises and reverberation and motor noises caused by the robot's own movements. This paper presents the active direction-pass filter (ADPF) to separate sounds originating from the specified direction with a pair of microphones. The ADPF is implemented by hierarchical integration of visual and auditory processing with hypothetical reasoning on interaural phase difference (IPD) and interaural intensity difference (IID) for each subband. In creating hypotheses, the reference data of IPD and HD is calculated by the auditory epipolar geometry on demand. Since the performance of the ADPF depends on the direction, the ADPF controls the direction by motor movement. The human tracking and sound source separation based on the ADPF is implemented on an upper-torso humanoid and runs in real-time with 4 PCs connected over Gigabit ethernet. The signal-to-noise ratio (SNR) of each sound separated by the ADPF from a mixture of two speeches with the same loudness is improved to about 10 dB from 0 dB.
引用
收藏
页码:118 / 125
页数:8
相关论文
共 50 条
  • [1] Real-time speaker localization and speech separation by audio-visual integration
    Nakadai, K
    Hidai, K
    Okuno, HG
    Kitano, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 1043 - 1049
  • [2] Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization
    Um, Sung Jin
    Kim, Dongjin
    Kim, Jung Uk
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3507 - 3516
  • [3] Active Audio-Visual Separation of Dynamic Sound Sources
    Majumder, Sagnik
    Grauman, Kristen
    COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 551 - 569
  • [4] Real-time sound source localization based on audiovisual frequency integration
    Tsuji, Tokuo
    Yamamoto, Kenkichi
    Ishii, Idaku
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 322 - +
  • [5] Real-time source separation based on sound localization in a reverberant environment
    Aoki, M
    Furuya, K
    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS, 2002, : 475 - 484
  • [6] AUDIO-VISUAL DISCREPANCY AND THE INFLUENCE ON VERTICAL SOUND SOURCE LOCALIZATION
    Werner, Stephan
    Liebetrau, Judith
    Sporer, Thomas
    2012 Fourth International Workshop on Quality of Multimedia Experience (QoMEX), 2012, : 133 - 139
  • [7] Audio-Visual Fusion for Sound Source Localization and Improved Attention
    Lee, Byoung-gi
    Choi, JongSuk
    Yoon, SangSuk
    Choi, Mun-Taek
    Kim, Munsang
    Kim, Daijin
    TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS A, 2011, 35 (07) : 737 - 743
  • [8] Information-Driven Active Audio-Visual Source Localization
    Schult, Niclas
    Reineking, Thomas
    Kluss, Thorsten
    Zetzsche, Christoph
    PLOS ONE, 2015, 10 (09):
  • [9] Visually Guided Sound Source Separation With Audio-Visual Predictive Coding
    Song, Zengjie
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15528 - 15542
  • [10] Real-Time Sound Source Localization
    Mandlik, Michal
    Nemec, Zdenek
    Dolecek, Radovan
    2012 13TH INTERNATIONAL RADAR SYMPOSIUM (IRS), 2012, : 322 - 325