Active Speakers in Context

被引：19

作者：

Alcazar, Juan Leon ^{[1
]}

Heilbron, Fabian Caba ^{[2
]}

Mai, Long ^{[2
]}

Perazzi, Federico ^{[2
]}

Lee, Joon-Young ^{[2
]}

Arbelaez, Pablo ^{[1
]}

Ghanem, Bernard ^{[3
]}

机构：

[1] Univ Los Andes, Bogota, Colombia

[2] Adobe Res, San Jose, CA USA

[3] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.01248

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current methods for active speaker detection focus on modeling audiovisual information from a single speaker. This strategy can be adequate for addressing single-speaker scenarios, but it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our new model learns pairwise and temporal relations from a structured ensemble of audiovisual observations. Our experiments show that a structured feature ensemble already benefits active speaker detection performance. We also find that the proposed Active Speaker Context improves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving an mAP of 87.1%. Moreover, ablation studies verify that this result is a direct consequence of our long-term multi-speaker analysis.

引用

页码：12462 / 12471

页数：10

共 40 条

[1]

AGRAWAL AK, 2018, IEEE CONF NANOTECH

[2] Diagnosing Error in Temporal Action Detectors [J].

Alwassel, Humam ;

Heilbron, Fabian Caba ;

Escorcia, Victor ;

Ghanem, Bernard .

COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 :264-280

[3]

[Anonymous], 2015, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2015.7298698

[4]

Chakravarty Punarjay, 2015, INT C MULT INT ICMI

[5]

Chakravarty Punarjay, 2016, INT C MULT INT ICMI

[6]

Chung J. S., 2019, ARXIV

[7] Lip Reading Sentences in the Wild [J].

Chung, Joon Son ;

Senior, Andrew ;

Vinyals, Oriol ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3444-3450

[8] Lip Reading in the Wild [J].

Chung, Joon Son ;

Zisserman, Andrew .

COMPUTER VISION - ACCV 2016, PT II, 2017, 10112 :87-103

[9]

Chung Joon Son, 2017, arXiv

[10]

Chung Joon Son, 2018, arXiv

← 1 2 3 4 →