Audio-visual speaker localization using graphical models

被引：0

作者：

Kushal, Akash ^{[1
]}

Rahurkar, Mandar ^{[2
]}

Li Fei-Fei ^{[2
]}

Ponce, Jean ^{[1
]}

Huang, Thomas ^{[2
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

来源：

18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS | 2006年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive framework for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth.

引用

页码：291 / +

页数：2

共 50 条

[21] ENVIRONMENTALLY ROBUST AUDIO-VISUAL SPEAKER IDENTIFICATION
Schoenherr, Lea
Orth, Dennis
Heckmann, Martin
Kolossa, Dorothea
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 312 - 318
[22] Multifactor fusion for audio-visual speaker recognition
Chetty, Girija
Tran, Dat
LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
[23] Audio-Visual Feature Fusion for Speaker Identification
Almaadeed, Noor
Aggoun, Amar
Amira, Abbes
NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
[24] Audio-visual system for robust speaker recognition
Chen, Q
Yang, JG
Gou, J
MLMTA '05: Proceedings of the International Conference on Machine Learning Models Technologies and Applications, 2005, : 97 - 103
[25] Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars
Ivanko, Denis
Ryumin, Dmitry
Axyonov, Alexandr
Kashevnik, Alexey
Karpov, Alexey
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1555 - 1559
[26] Audio-visual biometric based speaker identification
Kar, Biswajit
Bhatia, Sandeep
Dutta, P. K.
ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL IV, PROCEEDINGS, 2007, : 94 - 98
[27] Audio-visual speaker identification based on the use of dynamic audio and visual features
Fox, N
Reilly, RB
AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
[28] Audio-visual bimodal speaker identification using dynamic Bayesian networks
Wu, Zhiyong
Cai, Lianhong
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2006, 43 (03): : 470 - 475
[29] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
Tariquzzaman, Md.
Kim, Jin Young
Na, Seung You
Kim, Hyoung-Gook
Har, Dongsoo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
[30] Egocentric Audio-Visual Object Localization
Huang, Chao
Flan, Yapeng
Kurnar, Anurag
Xu, Chenliang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921

← 1 2 3 4 5 →