Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

被引:1
|
作者
Li, Yidi [1 ]
Liu, Hong [1 ]
Yang, Bing [1 ]
Ding, Runwei [2 ]
Chen, Yang [3 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Key Lab Machine Percept, Shenzhen 518055, Peoples R China
[2] Chongqing Univ Technol, Sch Artificial Intelligence, Chongqing 401135, Peoples R China
[3] Yanka Kupala State Univ Grodno, Grodno, BELARUS
基金
中国国家自然科学基金;
关键词
LOCALIZATION;
D O I
10.1155/2020/3764309
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
For speaker tracking, integrating multimodal information from audio and video provides an effective and promising solution. The current challenges are focused on the construction of a stable observation model. To this end, we propose a 3D audio-visual speaker tracker assisted by deep metric learning on the two-layer particle filter framework. Firstly, the audio-guided motion model is applied to generate candidate samples in the hierarchical structure consisting of an audio layer and a visual layer. Then, a stable observation model is proposed with a designed Siamese network, which provides the similarity-based likelihood to calculate particle weights. The speaker position is estimated using an optimal particle set, which integrates the decisions from audio particles and visual particles. Finally, the long short-term mechanism-based template update strategy is adopted to prevent drift during tracking. Experimental results demonstrate that the proposed method outperforms the single-modal trackers and comparison methods. Efficient and robust tracking is achieved both in 3D space and on image plane.
引用
收藏
页数:8
相关论文
共 42 条
  • [41] Unsupervised Deep Learning based Longitudinal Follicular Growth Tracking during IVF Cycle using 3D Transvaginal Ultrasound in Assisted Reproduction
    Srivastava, Diplav
    Gupta, Saumya
    Kudavelly, Srinivas
    Suryanarayana, Venkata K.
    Ramaraju, G. A.
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 3209 - 3212
  • [42] Joint 3D trajectory and phase shift optimization via deep reinforcement learning for RIS-assisted UAV communication systems
    Tang, Runzhi
    Wang, Junxuan
    Jiang, Fan
    Zhang, Xuewei
    Du, Jianbo
    PHYSICAL COMMUNICATION, 2024, 66