Audio-visual speaker localization using graphical models

被引:0
作者
Kushal, Akash [1 ]
Rahurkar, Mandar [2 ]
Li Fei-Fei [2 ]
Ponce, Jean [1 ]
Huang, Thomas [2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
来源
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS | 2006年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive framework for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth.
引用
收藏
页码:291 / +
页数:2
相关论文
共 50 条
  • [31] Egocentric Audio-Visual Object Localization
    Huang, Chao
    Flan, Yapeng
    Kurnar, Anurag
    Xu, Chenliang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921
  • [32] Method of speech recognition and speaker identification using audio-visual of polish speech and hidden Markov models
    Kubanek, Mariusz
    [J]. BIOMETRICS, COMPUTER SECURITY SYSTEMS AND ARTIFICIAL INTELLIGENCE APPLICATIONS, 2006, : 45 - 55
  • [33] The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear
    Fecher, Natalie
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2247 - 2250
  • [34] AVA ACTIVE SPEAKER: AN AUDIO-VISUAL DATASET FOR ACTIVE SPEAKER DETECTION
    Roth, Joseph
    Chaudhuri, Sourish
    Klejch, Ondrej
    Marvin, Radhika
    Gallagher, Andrew
    Kaver, Liat
    Ramaswamy, Sharadh
    Stopczynski, Arkadiusz
    Schmid, Cordelia
    Xi, Zhonghua
    Pantofaru, Caroline
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4492 - 4496
  • [35] Audio-Visual Speech Recognition in the Presence of a Competing Speaker
    Shao, Xu
    Barker, Jon
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1292 - 1295
  • [36] Speaker and digit recognition by audio-visual lip biometrics
    Faraj, Maycel Isaac
    Bigun, Josef
    [J]. ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1016 - +
  • [37] Dynamic Bayesian Networks for audio-visual speaker recognition
    Li, DD
    Yang, YC
    Wu, ZH
    [J]. ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 539 - 545
  • [38] Audio-visual speaker identification with asynchronous articulatory feature
    Chen, Yanxiang
    Liu, M.
    [J]. ELECTRONICS LETTERS, 2010, 46 (03) : 242 - U77
  • [39] Speaker independent audio-visual continuous speech recognition
    Liang, LH
    Liu, XX
    Zhao, YB
    Pi, XB
    Nefian, AV
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
  • [40] Dynamic dependency tests for audio-visual speaker association
    Siracusa, Michael R.
    Fisher, John W., III
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 457 - +