Audio-visual speaker localization using graphical models

被引：0

作者：

Kushal, Akash ^{[1
]}

Rahurkar, Mandar ^{[2
]}

Li Fei-Fei ^{[2
]}

Ponce, Jean ^{[1
]}

Huang, Thomas ^{[2
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

来源：

18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS | 2006年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive framework for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth.

引用

页码：291 / +

页数：2

共 50 条

[31] Egocentric Audio-Visual Object Localization
Huang, Chao
Flan, Yapeng
Kurnar, Anurag
Xu, Chenliang
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921
[32] Method of speech recognition and speaker identification using audio-visual of polish speech and hidden Markov models
Kubanek, Mariusz
[J]. BIOMETRICS, COMPUTER SECURITY SYSTEMS AND ARTIFICIAL INTELLIGENCE APPLICATIONS, 2006, : 45 - 55
[33] The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear
Fecher, Natalie
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2247 - 2250
[34] AVA ACTIVE SPEAKER: AN AUDIO-VISUAL DATASET FOR ACTIVE SPEAKER DETECTION
Roth, Joseph
Chaudhuri, Sourish
Klejch, Ondrej
Marvin, Radhika
Gallagher, Andrew
Kaver, Liat
Ramaswamy, Sharadh
Stopczynski, Arkadiusz
Schmid, Cordelia
Xi, Zhonghua
Pantofaru, Caroline
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4492 - 4496
[35] Audio-Visual Speech Recognition in the Presence of a Competing Speaker
Shao, Xu
Barker, Jon
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1292 - 1295
[36] Speaker and digit recognition by audio-visual lip biometrics
Faraj, Maycel Isaac
Bigun, Josef
[J]. ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1016 - +
[37] Dynamic Bayesian Networks for audio-visual speaker recognition
Li, DD
Yang, YC
Wu, ZH
[J]. ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 539 - 545
[38] Audio-visual speaker identification with asynchronous articulatory feature
Chen, Yanxiang
Liu, M.
[J]. ELECTRONICS LETTERS, 2010, 46 (03) : 242 - U77
[39] Speaker independent audio-visual continuous speech recognition
Liang, LH
Liu, XX
Zhao, YB
Pi, XB
Nefian, AV
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
[40] Dynamic dependency tests for audio-visual speaker association
Siracusa, Michael R.
Fisher, John W., III
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 457 - +

← 1 2 3 4 5 →