Saliency-based YOLO for single target detection

被引：17

作者：

Hu, Jun-ying ^{[1
,2
]}

Shi, C. -J. Richard ^{[3
]}

Zhang, Jiang-she ^{[2
]}

机构：

[1] Northwest Univ, Sch Math, Xian 710127, Peoples R China

[2] Xi An Jiao Tong Univ, Sch Math & Stat, Xian 710049, Peoples R China

[3] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2021年 / 63卷 / 03期

关键词：

Deep neural network; Object detection; Saliency map; Visual saliency;

D O I：

10.1007/s10115-020-01538-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

At present, You only look once (YOLO) is the fastest real-time object detection system based on a unified deep neural network. During training, YOLO divides the input image to S x S gird cells and the only one grid cell that contains the center of an object, takes charge of detecting that object. It is not sure that the cell corresponding to the center of the object is the best choice to detect the object. In this paper, inspired by the visual saliency mechanism we introduce the saliency map to YOLO to develop YOLO3-SM method, where saliency map selects the grid cell containing the most salient part of the object to detect the object. The experimental results on two data sets show that the prediction box of YOLO3-SM obtains the lager IOU value, which demonstrates that compared with YOLO3 , the YOLO3-SM selects the cell that is more suitable to detect the object . In addition, YOLO3-SM gets the highest mAP that the other three state-of-the-art object detection methods on the two data sets, which shows that introducing the saliency map to YOLO can improve the detection performance.

引用

页码：717 / 732

页数：16

共 14 条

[1] State-of-the-Art in Visual Attention Modeling [J].

Borji, Ali ;

Itti, Laurent .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207

[2]

Girshick R., 2015, P IEEE INT C COMP VI, DOI [DOI 10.1109/ICCV.2015.169, 10.1109/ICCV.2015.169]

[3] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[4]

He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

[5]

Joseph RK, 2016, CRIT POL ECON S ASIA, P1

[6] SSD: Single Shot MultiBox Detector [J].

Liu, Wei ;

Anguelov, Dragomir ;

Erhan, Dumitru ;

Szegedy, Christian ;

Reed, Scott ;

Fu, Cheng-Yang ;

Berg, Alexander C. .

COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :21-37

[7]

Redmon J, 2018, Arxiv, DOI arXiv:1804.02767

[8] You Only Look Once: Unified, Real-Time Object Detection [J].

Redmon, Joseph ;

Divvala, Santosh ;

Girshick, Ross ;

Farhadi, Ali .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :779-788

[9] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].

Ren, Shaoqing ;

He, Kaiming ;

Girshick, Ross ;

Sun, Jian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149

[10]

Rutishauser U, 2004, PROC CVPR IEEE, P37

← 1 2 →