Active Speakers in Context

被引:19
作者
Alcazar, Juan Leon [1 ]
Heilbron, Fabian Caba [2 ]
Mai, Long [2 ]
Perazzi, Federico [2 ]
Lee, Joon-Young [2 ]
Arbelaez, Pablo [1 ]
Ghanem, Bernard [3 ]
机构
[1] Univ Los Andes, Bogota, Colombia
[2] Adobe Res, San Jose, CA USA
[3] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.01248
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current methods for active speaker detection focus on modeling audiovisual information from a single speaker. This strategy can be adequate for addressing single-speaker scenarios, but it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our new model learns pairwise and temporal relations from a structured ensemble of audiovisual observations. Our experiments show that a structured feature ensemble already benefits active speaker detection performance. We also find that the proposed Active Speaker Context improves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving an mAP of 87.1%. Moreover, ablation studies verify that this result is a direct consequence of our long-term multi-speaker analysis.
引用
收藏
页码:12462 / 12471
页数:10
相关论文
共 40 条
[31]  
Ravanelli Mirco, 2018, IEEE SPOK LANG TECHN
[32]  
Roth Joseph, 2019, ARXIV
[33]  
Saenko Kate, 2005, ICCV
[34]  
Vaswani A, 2017, Advances in neural information processing systems, P5998, DOI [10.48550/arXiv.1706.03762, DOI 10.48550/ARXIV.1706.03762]
[35]  
Wang Q.-W., 2018, ARXIV
[36]  
WANG XP, 2018, CVPR, V19, P1, DOI DOI 10.1163/9789004385580_002
[37]   Long-Term Feature Banks for Detailed Video Understanding [J].
Wu, Chao-Yuan ;
Feichtenhofer, Christoph ;
Fan, Haoqi ;
He, Kaiming ;
Krahenbuhl, Philipp ;
Girshick, Ross .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :284-293
[38]  
Zhang Aonan, 2019, IEEE INT C AC SPEECH
[39]   Analysis of clinical effect of humanized nursing mode on patients with hypertensive cerebral hemorrhage [J].
Zhang, Yumei ;
Shi, Fen ;
Wang, Changqin ;
Wang, Lei .
PANMINERVA MEDICA, 2020,
[40]   Research of microalloy elements to induce intragranular acicular ferrite in shipbuilding steel [J].
Zhu, Liguang ;
Wang, Yan ;
Wang, Shuoming ;
Zhang, Qingjun ;
Zhang, Caijun .
IRONMAKING & STEELMAKING, 2019, 46 (06) :499-507