An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset

被引:0
|
作者
Nickel, Kai [1 ]
Gehrig, Tobias [1 ]
Ekenel, Hazim K. [1 ]
McDonough, John [1 ]
Stiefelhagen, Rainer [1 ]
机构
[1] Univ Karlsruhe, Interact Sys Labs, D-76131 Karlsruhe, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. In the CLEAR'06 evaluation, the system yielded a tracking accuracy (MOTA) of 71% for video-only, 55% for audio-only and 90% for combined audio-visual tracking.
引用
收藏
页码:69 / 80
页数:12
相关论文
共 50 条
  • [11] NON-ZERO DIFFUSION PARTICLE FLOW SMC-PHD FILTER FOR AUDIO-VISUAL MULTI-SPEAKER TRACKING
    Liu, Yang
    Hilton, Adrian
    Chambers, Jonathon
    Zhao, Yuxin
    Wang, Wenwu
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4304 - 4308
  • [12] Audio-Visual Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking
    Liu, Yang
    Kilic, Volkan
    Guan, Jian
    Wang, Wenwu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (04) : 934 - 948
  • [13] A speaker tracking algorithm based on audio and visual information fusion using particle filter
    Li, X
    Sun, L
    Tao, LM
    Xu, GY
    Jia, Y
    IMAGE ANALYSIS AND RECOGNITION, PT 2, PROCEEDINGS, 2004, 3212 : 572 - 580
  • [14] Joint Audio-Visual Tracking Using Particle Filters
    Dmitry N. Zotkin
    Ramani Duraiswami
    Larry S. Davis
    EURASIP Journal on Advances in Signal Processing, 2002
  • [15] Joint audio-visual tracking using particle filters
    Zotkin, DN
    Duraiswami, R
    Davis, LS
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1154 - 1164
  • [16] PERCEPTUAL EVALUATION ON AUDIO-VISUAL DATASET OF 360 CONTENT
    Fela, Randy F.
    Pastor, Andreas
    Le Callet, Patrick
    Zacharov, Nick
    Vigier, Toinon
    Forchhammer, Soren
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [17] Audio-Visual Multi-Speaker Tracking Based On the GLMB Framework
    Lin, Shoufeng
    Qian, Xinyuan
    INTERSPEECH 2020, 2020, : 3082 - 3086
  • [18] AV16.3: An audio-visual corpus for speaker localization and tracking
    Lathoud, G
    Odobez, JM
    Gatica-Perez, D
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 182 - 195
  • [19] Multi-Speaker Tracking From an Audio-Visual Sensing Device
    Qian, Xinyuan
    Brutti, Alessio
    Lanz, Oswald
    Omologo, Maurizio
    Cavallaro, Andrea
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (10) : 2576 - 2588
  • [20] Audio-Visual Cross-Attention Network for Robotic Speaker Tracking
    Qian, Xinyuan
    Wang, Zhengdong
    Wang, Jiadong
    Guan, Guohui
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 550 - 562