Audio-visual speaker localization using graphical models

被引:0
|
作者
Kushal, Akash [1 ]
Rahurkar, Mandar [2 ]
Li Fei-Fei [2 ]
Ponce, Jean [1 ]
Huang, Thomas [2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
来源
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS | 2006年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive framework for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth.
引用
收藏
页码:291 / +
页数:2
相关论文
共 50 条
  • [1] Deep Audio-Visual Beamforming for Speaker Localization
    Qian, Xinyuan
    Zhang, Qiquan
    Guan, Guohui
    Xue, Wei
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1132 - 1136
  • [2] Audio-visual graphical models for speech processing
    Hershey, J
    Attias, H
    Jojic, N
    Kristjansson, T
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 649 - 652
  • [3] AUDIO-VISUAL SPEAKER LOCALIZATION VIA WEIGHTED CLUSTERING
    Gebru, Israel D.
    Alameda-Pineda, Xavier
    Horaud, Radu
    Forbes, Florence
    2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2014,
  • [4] Audio-visual speaker identification using coupled hidden markov models
    Fu, T
    Liu, XX
    Liang, LH
    Pi, XB
    Nefian, AV
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 29 - 32
  • [5] Probabilistic speaker localization in noisy enviromments by audio-visual integration
    Choi, Jong-Suk
    Kim, Munsang
    Kim, Hyun-Don
    2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 4704 - +
  • [6] Audio-Visual Clustering for 3D Speaker Localization
    Khalidov, Vasil
    Forbes, Florence
    Hansard, Miles
    Arnaud, Elise
    Horaud, Radu
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 86 - 97
  • [7] Paper: Speaker Localization Based on Audio-Visual Bimodal Fusion
    Zhu, Ying-Xin
    Jin, Hao-Ran
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (03) : 375 - 382
  • [8] AV16.3: An audio-visual corpus for speaker localization and tracking
    Lathoud, G
    Odobez, JM
    Gatica-Perez, D
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 182 - 195
  • [9] Audio-Visual Synchronisation for Speaker Diarisation
    Garau, Giulia
    Dielmann, Alfred
    Bourlard, Herve
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2662 - +
  • [10] Speaker localisation using audio-visual synchrony: An empirical study
    Nock, HJ
    Iyengar, G
    Neti, C
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 488 - 499