Audio-Visual Multi-Speaker Tracking Based On the GLMB Framework

被引:4
|
作者
Lin, Shoufeng [1 ]
Qian, Xinyuan [1 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
来源
INTERSPEECH 2020 | 2020年
基金
新加坡国家研究基金会;
关键词
multi-speaker tracking; 3D; audio-visual fusion; GLMB filter; RANDOM FINITE SETS; PARTICLE;
D O I
10.21437/Interspeech.2020-1969
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Multi-speaker tracking using both audio and video modalities is a key task in human-robot interaction and video conferencing. The complementary nature of audio and video signals improves the tracking robustness against noise and outliers compared to the uni-modal approaches. However, the online tracking of multiple speakers via audio-video fusion, especially without the target number prior, is still an open challenge. In this paper, we propose a Generalized Labelled Multi-Bernoulli (GLMB)-based framework that jointly estimates the number of targets and their respective states online. Experimental results using the AV16.3 dataset demonstrate the effectiveness of the proposed method.
引用
收藏
页码:3082 / 3086
页数:5
相关论文
共 50 条
  • [1] ACCOUNTING FOR ROOM ACOUSTICS IN AUDIO-VISUAL MULTI-SPEAKER TRACKING
    Ban, Yutong
    Li, Xiaofei
    Alameda-Pineda, Xavier
    Girin, Laurent
    Horaud, Radu
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6553 - 6557
  • [2] Multi-Speaker Tracking From an Audio-Visual Sensing Device
    Qian, Xinyuan
    Brutti, Alessio
    Lanz, Oswald
    Omologo, Maurizio
    Cavallaro, Andrea
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (10) : 2576 - 2588
  • [3] Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars
    Ivanko, Denis
    Ryumin, Dmitry
    Axyonov, Alexandr
    Kashevnik, Alexey
    Karpov, Alexey
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1555 - 1559
  • [4] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    DIGITAL SIGNAL PROCESSING, 2024, 145
  • [5] Particle Flow SMC-PHD Filter for Audio-Visual Multi-speaker Tracking
    Liu, Yang
    Wang, Wenwu
    Chambers, Jonathon
    Kilic, Volkan
    Hilton, Adrian
    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 : 344 - 353
  • [6] Audio-Visual Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking
    Liu, Yang
    Kilic, Volkan
    Guan, Jian
    Wang, Wenwu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (04) : 934 - 948
  • [7] Exploiting the Complementarity of Audio and Visual Data in Multi-Speaker Tracking
    Ban, Yutong
    Girin, Laurent
    Alameda-Pineda, Xavier
    Horaud, Radu
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 446 - 454
  • [8] Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter
    Zhao, Jinzheng
    Wu, Peipei
    Liu, Xubo
    Goudarzi, Shidrokh
    Liu, Haohe
    Xu, Yong
    Wang, Wenwu
    INTERSPEECH 2022, 2022, : 3704 - 3708
  • [9] NON-ZERO DIFFUSION PARTICLE FLOW SMC-PHD FILTER FOR AUDIO-VISUAL MULTI-SPEAKER TRACKING
    Liu, Yang
    Hilton, Adrian
    Chambers, Jonathon
    Zhao, Yuxin
    Wang, Wenwu
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4304 - 4308
  • [10] Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise
    Cao, Jie
    Li, Jun
    Li, Wei
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 215 - 226