Multi-Scale Attention Deep Neural Network for Fast Accurate Object Detection

被引:55
作者
Song, Kaiyou [1 ]
Yang, Hua [1 ]
Yin, Zhouping [1 ]
机构
[1] Huazhong Univ Sci & Technol, State Key Lab Digital Mfg Equipment & Technol, Sch Mech Sci & Engn, Wuhan 430074, Hubei, Peoples R China
基金
美国国家科学基金会;
关键词
Object detection; attention model; feature fusion; deep neural network;
D O I
10.1109/TCSVT.2018.2875449
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Object detection remains a challenging task in computer vision due to the tremendous extent of changes in the appearances of objects caused by clustered backgrounds, occlusion, truncation, and scale change. Current deep neural network (DNN)-based object detection methods cannot simultaneously achieve a high accuracy and a high efficiency. To overcome this limitation, in this paper, we propose a novel multi-scale attention (MSA) DNN for accurate object detection with high efficiency. The proposed MSA-DNN method utilizes a novel multi-scale feature fusion module (MSFFM) to construct high-level semantic features. Subsequently, a novel MSA module (MSAM) based on the fused layers of the MSFFM is introduced to exploit the global semantic information of image-level labels to guide detection. On the one hand, MSAM can capture global semantic information to further enhance the semantic feature representation of the fused layers constructed by the MSFFM, thereby improving the detection accuracy. On the other hand, the MSA maps generated by MSAM can be employed to rapidly and coarsely locate objects at different scales. In addition, an attention-based hard negative mining strategy is introduced to filter out negative samples to reduce the search space, dramatically alleviating the severe class imbalance problem. Extensive experimental results on the challenging PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO datasets demonstrate that MSA-DNN achieves a state-of-the-art detection accuracy while maintaining a high efficiency. Furthermore, MSA-DNN significantly improves the small-object detection accuracy.
引用
收藏
页码:2972 / 2985
页数:14
相关论文
共 55 条
[1]  
[Anonymous], IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.324
[2]  
[Anonymous], P IEEE INT C COMP VI
[3]  
[Anonymous], 2017, PROC IEEE C COMPUT V
[4]  
[Anonymous], PROC CVPR IEEE
[5]  
[Anonymous], 2017, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.322
[6]  
[Anonymous], 2015, P 3 INT C LEARN REPR
[7]  
[Anonymous], PROC CVPR IEEE
[8]  
[Anonymous], ADV NEURAL INFORM PR, DOI DOI 10.1109/TPAMI.2016.2577031
[9]  
[Anonymous], 2016, ADV NEURAL INFORM PR
[10]  
[Anonymous], DSSD DECONVOLUTIONAL