Active Speakers in Context

被引:19
作者
Alcazar, Juan Leon [1 ]
Heilbron, Fabian Caba [2 ]
Mai, Long [2 ]
Perazzi, Federico [2 ]
Lee, Joon-Young [2 ]
Arbelaez, Pablo [1 ]
Ghanem, Bernard [3 ]
机构
[1] Univ Los Andes, Bogota, Colombia
[2] Adobe Res, San Jose, CA USA
[3] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.01248
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current methods for active speaker detection focus on modeling audiovisual information from a single speaker. This strategy can be adequate for addressing single-speaker scenarios, but it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our new model learns pairwise and temporal relations from a structured ensemble of audiovisual observations. Our experiments show that a structured feature ensemble already benefits active speaker detection performance. We also find that the proposed Active Speaker Context improves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving an mAP of 87.1%. Moreover, ablation studies verify that this result is a direct consequence of our long-term multi-speaker analysis.
引用
收藏
页码:12462 / 12471
页数:10
相关论文
共 40 条
[1]  
AGRAWAL AK, 2018, IEEE CONF NANOTECH
[2]   Diagnosing Error in Temporal Action Detectors [J].
Alwassel, Humam ;
Heilbron, Fabian Caba ;
Escorcia, Victor ;
Ghanem, Bernard .
COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 :264-280
[3]  
[Anonymous], 2015, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2015.7298698
[4]  
Chakravarty Punarjay, 2015, INT C MULT INT ICMI
[5]  
Chakravarty Punarjay, 2016, INT C MULT INT ICMI
[6]  
Chung J. S., 2019, ARXIV
[7]   Lip Reading Sentences in the Wild [J].
Chung, Joon Son ;
Senior, Andrew ;
Vinyals, Oriol ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3444-3450
[8]   Lip Reading in the Wild [J].
Chung, Joon Son ;
Zisserman, Andrew .
COMPUTER VISION - ACCV 2016, PT II, 2017, 10112 :87-103
[9]  
Chung Joon Son, 2017, arXiv
[10]  
Chung Joon Son, 2018, arXiv