Audio-visual speaker localization using graphical models

被引:0
作者
Kushal, Akash [1 ]
Rahurkar, Mandar [2 ]
Li Fei-Fei [2 ]
Ponce, Jean [1 ]
Huang, Thomas [2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
来源
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS | 2006年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive framework for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth.
引用
收藏
页码:291 / +
页数:2
相关论文
共 50 条
[41]   Audio-visual speaker recognition for video broadcast news [J].
Maison, B ;
Neti, C ;
Senior, A .
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 29 (1-2) :71-79
[42]   Audio-visual speaker tracking with importance particle filters [J].
Gatica-Perez, D ;
Lathoud, G ;
McCowan, I ;
Odobez, JM ;
Moore, D .
2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, :25-28
[43]   Audio-Visual Multilevel Fusion for Speech and Speaker Recognition [J].
Chetty, Girija ;
Wagner, Michael .
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, :379-382
[44]   Onmidirectional audio-visual talker localization based on dynamic fusion of audio-visual features using validity and reliability criteria [J].
Denda, Yuki ;
Nishiura, Takanobu ;
Yamashita, Yoichi .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) :598-606
[45]   Fuzzy audio-visual feature maps for speaker identification [J].
Chibelushi, CC .
APPLICATIONS AND SCIENCE IN SOFT COMPUTING, 2004, :317-322
[46]   Target Active Speaker Detection with Audio-visual Cues [J].
Jiang, Yidi ;
Tao, Ruijie ;
Pan, Zexu ;
Li, Haizhou .
INTERSPEECH 2023, 2023, :3152-3156
[47]   Audio-Visual Speaker Recognition for Video Broadcast News [J].
Benoît Maison ;
Chalapathy Neti ;
Andrew Senior .
Journal of VLSI signal processing systems for signal, image and video technology, 2001, 29 :71-79
[48]   RETHINKING AUDIO-VISUAL SYNCHRONIZATION FOR ACTIVE SPEAKER DETECTION [J].
Wuerkaixi, Abudukelimu ;
Zhang, You ;
Duan, Zhiyao ;
Zhang, Changshui .
2022 IEEE 32ND INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2022,
[49]   A self-calibrating algorithm for speaker tracking based on audio-visual statistical models [J].
Beal, MJ ;
Jojic, N ;
Attias, H .
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, :1997-2000
[50]   AS-Net: active speaker detection using deep audio-visual attention [J].
Radman, Abduljalil ;
Laaksonen, Jorma .
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) :72027-72042