Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

被引：1

作者：

Li, Yidi ^{[1
]}

Liu, Hong ^{[1
]}

Yang, Bing ^{[1
]}

Ding, Runwei ^{[2
]}

Chen, Yang ^{[3
]}

机构：

[1] Peking Univ, Shenzhen Grad Sch, Key Lab Machine Percept, Shenzhen 518055, Peoples R China

[2] Chongqing Univ Technol, Sch Artificial Intelligence, Chongqing 401135, Peoples R China

[3] Yanka Kupala State Univ Grodno, Grodno, BELARUS

来源：

COMPLEXITY | 2020年 / 2020卷

基金：

中国国家自然科学基金;

关键词：

LOCALIZATION;

D O I：

10.1155/2020/3764309

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

For speaker tracking, integrating multimodal information from audio and video provides an effective and promising solution. The current challenges are focused on the construction of a stable observation model. To this end, we propose a 3D audio-visual speaker tracker assisted by deep metric learning on the two-layer particle filter framework. Firstly, the audio-guided motion model is applied to generate candidate samples in the hierarchical structure consisting of an audio layer and a visual layer. Then, a stable observation model is proposed with a designed Siamese network, which provides the similarity-based likelihood to calculate particle weights. The speaker position is estimated using an optimal particle set, which integrates the decisions from audio particles and visual particles. Finally, the long short-term mechanism-based template update strategy is adopted to prevent drift during tracking. Experimental results demonstrate that the proposed method outperforms the single-modal trackers and comparison methods. Efficient and robust tracking is achieved both in 3D space and on image plane.

引用

页数：8

共 42 条

[1] 3D AUDIO-VISUAL SPEAKER TRACKING WITH A TWO-LAYER PARTICLE FILTER
Liu, Hong
Li, Yidi
Yang, Bing
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1955 - 1959
[2] 3D Audio-Visual Speaker Tracking with A Novel Particle Filter
Liu, Hong
Sun, Yongheng
Li, Yidi
Yang, Bing
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7343 - 7348
[3] 3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER
Qian, Xinyuan
Brutti, Alessio
Omologo, Maurizio
Cavallaro, Andrea
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2896 - 2900
[4] An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset
Nickel, Kai
Gehrig, Tobias
Ekenel, Hazim K.
McDonough, John
Stiefelhagen, Rainer
MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2007, 4122 : 69 - 80
[5] Audio-Visual Clustering for 3D Speaker Localization
Khalidov, Vasil
Forbes, Florence
Hansard, Miles
Arnaud, Elise
Horaud, Radu
MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 86 - 97
[6] Particle Flow SMC-PHD Filter for Audio-Visual Multi-speaker Tracking
Liu, Yang
Wang, Wenwu
Chambers, Jonathon
Kilic, Volkan
Hilton, Adrian
LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 : 344 - 353
[7] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
Su, Rongfeng
Wang, Lan
Liu, Xunying
2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
[8] NON-ZERO DIFFUSION PARTICLE FLOW SMC-PHD FILTER FOR AUDIO-VISUAL MULTI-SPEAKER TRACKING
Liu, Yang
Hilton, Adrian
Chambers, Jonathon
Zhao, Yuxin
Wang, Wenwu
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4304 - 4308
[9] L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality
Gramaccioni, Riccardo F.
Marinoni, Christian
Chen, Changan
Uncini, Aurelio
Comminiello, Danilo
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 632 - 640
[10] Visual object tracking in 3D with color based particle filter
Barrera, Pablo
Canas, Jose M.
Matellan, Vicente
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 4, 2005, 4 : 200 - 203

← 1 2 3 4 5 →