AUDIO-VISUAL SPEAKER LOCALIZATION VIA WEIGHTED CLUSTERING

被引：0

作者：

Gebru, Israel D. ^{[1
]}

Alameda-Pineda, Xavier ^{[1
]}

Horaud, Radu ^{[1
]}

Forbes, Florence ^{[1
]}

机构：

[1] INRIA Grenoble Rhone Alpes, Grenoble, France

来源：

2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2014年

基金：

欧盟第七框架计划;

关键词：

Mixture models; audiovisual fusion; multimodal signal processing; weighted-data clustering;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we address the problem of detecting and locating speakers using audiovisual data. We address this problem in the framework of clustering. We propose a novel weighted clustering method based on a finite mixture model which explores the idea of non-uniform weighting of observations. Weighted-data clustering techniques have already been proposed, but not in a generative setting as presented here. We introduce a weighted-data mixture model and we formally devise the associated EM procedure. The clustering algorithm is applied to the problem of detecting and localizing a speaker over time using both visual and auditory observations gathered with a single camera and two microphones. Audiovisual fusion is enforced by introducing a cross-modal weighting scheme. We test the robustness of the method with experiments in two challenging scenarios: disambiguate between an active and a non-active speaker, and associate a speech signal with a person.

引用

页数：6

共 50 条

[11] 3D Audio-Visual Speaker Tracking with A Novel Particle Filter
Liu, Hong
Sun, Yongheng
Li, Yidi
Yang, Bing
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7343 - 7348
[12] 3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER
Qian, Xinyuan
Brutti, Alessio
Omologo, Maurizio
Cavallaro, Andrea
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2896 - 2900
[13] Audio-visual biometrics
Aleksic, Petar S.
Katsaggelos, Aggelos K.
PROCEEDINGS OF THE IEEE, 2006, 94 (11) : 2025 - 2044
[14] Audio-visual speaker diarization using fisher linear semi-discriminant analysis
Sarafianos, Nikolaos
Giannakopoulos, Theodoros
Petridis, Sergios
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (01) : 115 - 130
[15] Audio-visual speaker diarization using fisher linear semi-discriminant analysis
Nikolaos Sarafianos
Theodoros Giannakopoulos
Sergios Petridis
Multimedia Tools and Applications, 2016, 75 : 115 - 130
[16] AUDIO-VISUAL TRACKING OF MULTIPLE SPEAKERS VIA A PMBM FILTER
Zhao, Jinzheng
Wu, Peipei
Liu, Xubo
Xu, Yong
Mihaylova, Lyudmila
Godsill, Simon
Wang, Wenwu
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 5068 - 5072
[17] 3D AUDIO-VISUAL SPEAKER TRACKING WITH A TWO-LAYER PARTICLE FILTER
Liu, Hong
Li, Yidi
Yang, Bing
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1955 - 1959
[18] Audio-Visual Tracking of Concurrent Speakers
Qian, Xinyuan
Brutti, Alessio
Lanz, Oswald
Omologo, Maurizio
Cavallaro, Andrea
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 942 - 954
[19] Audio-Visual Multi-person Keyword Spotting via Hybrid Fusion
Su, Yuxin
Miao, Ziling
Liu, Hong
ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 327 - 338
[20] WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION
Ramaswamy, Janani
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4372 - 4376

← 1 2 3 4 5 →