AUDIO-VISUAL SPEAKER LOCALIZATION VIA WEIGHTED CLUSTERING

被引:0
作者
Gebru, Israel D. [1 ]
Alameda-Pineda, Xavier [1 ]
Horaud, Radu [1 ]
Forbes, Florence [1 ]
机构
[1] INRIA Grenoble Rhone Alpes, Grenoble, France
来源
2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2014年
基金
欧盟第七框架计划;
关键词
Mixture models; audiovisual fusion; multimodal signal processing; weighted-data clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we address the problem of detecting and locating speakers using audiovisual data. We address this problem in the framework of clustering. We propose a novel weighted clustering method based on a finite mixture model which explores the idea of non-uniform weighting of observations. Weighted-data clustering techniques have already been proposed, but not in a generative setting as presented here. We introduce a weighted-data mixture model and we formally devise the associated EM procedure. The clustering algorithm is applied to the problem of detecting and localizing a speaker over time using both visual and auditory observations gathered with a single camera and two microphones. Audiovisual fusion is enforced by introducing a cross-modal weighting scheme. We test the robustness of the method with experiments in two challenging scenarios: disambiguate between an active and a non-active speaker, and associate a speech signal with a person.
引用
收藏
页数:6
相关论文
共 50 条
  • [11] 3D Audio-Visual Speaker Tracking with A Novel Particle Filter
    Liu, Hong
    Sun, Yongheng
    Li, Yidi
    Yang, Bing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7343 - 7348
  • [12] 3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER
    Qian, Xinyuan
    Brutti, Alessio
    Omologo, Maurizio
    Cavallaro, Andrea
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2896 - 2900
  • [13] Audio-visual biometrics
    Aleksic, Petar S.
    Katsaggelos, Aggelos K.
    PROCEEDINGS OF THE IEEE, 2006, 94 (11) : 2025 - 2044
  • [14] Audio-visual speaker diarization using fisher linear semi-discriminant analysis
    Sarafianos, Nikolaos
    Giannakopoulos, Theodoros
    Petridis, Sergios
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (01) : 115 - 130
  • [15] Audio-visual speaker diarization using fisher linear semi-discriminant analysis
    Nikolaos Sarafianos
    Theodoros Giannakopoulos
    Sergios Petridis
    Multimedia Tools and Applications, 2016, 75 : 115 - 130
  • [16] AUDIO-VISUAL TRACKING OF MULTIPLE SPEAKERS VIA A PMBM FILTER
    Zhao, Jinzheng
    Wu, Peipei
    Liu, Xubo
    Xu, Yong
    Mihaylova, Lyudmila
    Godsill, Simon
    Wang, Wenwu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 5068 - 5072
  • [17] 3D AUDIO-VISUAL SPEAKER TRACKING WITH A TWO-LAYER PARTICLE FILTER
    Liu, Hong
    Li, Yidi
    Yang, Bing
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1955 - 1959
  • [18] Audio-Visual Tracking of Concurrent Speakers
    Qian, Xinyuan
    Brutti, Alessio
    Lanz, Oswald
    Omologo, Maurizio
    Cavallaro, Andrea
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 942 - 954
  • [19] Audio-Visual Multi-person Keyword Spotting via Hybrid Fusion
    Su, Yuxin
    Miao, Ziling
    Liu, Hong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 327 - 338
  • [20] WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION
    Ramaswamy, Janani
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4372 - 4376