Top-down, Spatio-Temporal Attentional Guidance for On-road Object Detection

被引：0

作者：

Withanawasam, Jayani ^{[1
]}

Javanmardi, Ehsan ^{[2
]}

Kamijo, Shunsuke ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Tokyo, Japan

[2] Univ Tokyo, Inst Ind Sci, Tokyo, Japan

来源：

2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC) | 2020年

关键词：

computational visual attention; road scene understanding; intelligent vehicles; VISUAL-ATTENTION;

D O I：

10.1109/itsc45102.2020.9294465

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

On-road object detection is a crucial component for environmental perception in intelligent vehicles. Anchor generation is an intermediate step in object detection to derive a set of reference boxes for possible objects in different scales and aspect ratios. State-of-the-art object detectors rely on a massive number of pre-determined anchors over the whole scene. The required operational cost is a drawback in resource-constrained, mobile environments. In contrast, humans rapidly attend to relevant regions in the scene in detail based on the prior knowledge on the current goal and task in hand. Inspired by this observation, we aim to computationally model this top-down visual attention mechanism for the driving task to guide the anchoring process of on-road object detection. In particular, we use the knowledge about the environmental risk level and the underlying risk factors specifically for the driving task to derive an attention region to remain vigilant upon. Then, we perform anchor generation and subsequent operations for object detection only in the extracted attention region. Experimental results demonstrate that the proposed method significantly reduces the operational cost while preserving a competitive accuracy.

引用

页数：8

共 31 条

[1]

[Anonymous], 2013, ARXIV PREPRINT ARXIV

[2]

[Anonymous], 2004, ROTUNDE PUBLIKASJONE

[3] What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments [J].

Borji, Ali ;

Sihite, Dicky N. ;

Itti, Laurent .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (05) :523-538

[4] State-of-the-Art in Visual Attention Modeling [J].

Borji, Ali ;

Itti, Laurent .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207

[5] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[6]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[7]

Deng Tao, 2019, IEEE T INTELLIGENT T

[8]

Doshi A., 2010, P IEEE COMP SOC C CO, P21

[9] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

[10]

Fang YM, 2011, INT CONF ACOUST SPEE, P1293

← 1 2 3 4 →