Residual attention mechanism and weighted feature fusion for multi-scale object detection

被引:3
作者
Zhang, Jie [1 ]
Qi, Qiye [1 ]
Zhang, Huanlong [1 ]
Du, Qifan [1 ]
Wang, Fengxian [1 ]
Shi, Xiaoping [2 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Elect & Informat Engn, Dongfeng Rd, Zhengzhou 450002, Henan, Peoples R China
[2] Harbin Inst Technol, Harbin, Peoples R China
基金
美国国家科学基金会;
关键词
Deep learning; Object detection; Residual attention mechanism; Weighted feature fusion;
D O I
10.1007/s11042-023-14997-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Object detection is one of the critical problems in computer vision research, which is also an essential basis for understanding high-level semantic information of images. To improve object detection performance, an improved YOLOv3 multi-scale object detection method is proposed in this article. Firstly, a residual attention module is introduced into the neck of YOLOv3, which includes the channel attention module, spatial attention module, and skip connection. The residual attention module is applied to the three layers of features obtained from the backbone, making the output feature focus on the channels and regions related to the object. Secondly, an additional weight is proposed to add to each input feature in the top-down feature fusion stage of YOLOv3, the size of which is determined by the degree of contribution of each input feature to the output features. The experimental results on KITTI, PASCAL VOC, and bird's nest datasets fully verify the effectiveness of the proposed method in object detection. The proposed method has significant value in electric power inspection and self-driving automobiles.
引用
收藏
页码:40873 / 40889
页数:17
相关论文
共 46 条
[1]  
[Anonymous], 2015, BMVC
[2]   Ten Years of Pedestrian Detection, What Have We Learned? [J].
Benenson, Rodrigo ;
Omran, Mohamed ;
Hosang, Jan ;
Schiele, Bernt .
COMPUTER VISION - ECCV 2014 WORKSHOPS, PT II, 2015, 8926 :613-627
[3]  
Berg A.C, 2017, ARXIV PREPRINT ARXIV
[4]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[5]   Embedding Attention and Residual Network for Accurate Salient Object Detection [J].
Chen, Shuhan ;
Wang, Ben ;
Tan, Xiuli ;
Hu, Xuelong .
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (05) :2050-2062
[6]   Control of goal-directed and stimulus-driven attention in the brain [J].
Corbetta, M ;
Shulman, GL .
NATURE REVIEWS NEUROSCIENCE, 2002, 3 (03) :201-215
[7]  
Dai JF, 2016, ADV NEUR IN, V29
[8]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[9]   A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving [J].
Feng, Di ;
Harakeh, Ali ;
Waslander, Steven L. ;
Dietmayer, Klaus .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (08) :9961-9980
[10]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587