Sparse attention block: Aggregating contextual information for object detection

被引:18
作者
Chen, Chunlin [1 ]
Yu, Jun [1 ]
Ling, Qiang [1 ]
机构
[1] Univ Sci & Technol China, Hefei 230027, Peoples R China
关键词
Object detection; Self-attention; Convolution neural network;
D O I
10.1016/j.patcog.2021.108418
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well recognized that the contextual information of surrounding objects is beneficial for object detection. Such contextual information can often be obtained from long-range dependencies. This paper proposes a sparse attention block to capture long-range dependencies in an efficient way. Unlike the conventional non-local block, which generates a dense attention map to characterize the dependency between any two positions of the input feature map, our sparse attention block samples the most representative positions for contextual information aggregation. After searching for local peaks in a heat map of the given input feature map, it adaptively selects a sparse set of positions to represent the relationship between query and key elements. With the obtained sparse positions, our sparse attention block can well model long-range dependencies, and greatly improve the object detection performance at the additional cost of < 2% GPU memory and computation of the conventional non-local block. This sparse attention block can be easily plugged into various object detection frameworks, such as Faster R-CNN, RetinaNet and Mask R-CNN. Experiments on COCO benchmark confirm that our sparse attention block can boost the detection accuracy with significant gains ranging from 1.4% to 1.9% and negligible overhead of computation and memory usage.
引用
收藏
页数:12
相关论文
共 37 条
[1]   A non-local algorithm for image denoising [J].
Buades, A ;
Coll, B ;
Morel, JM .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2005, :60-65
[2]   Cascade R-CNN: High Quality Object Detection and Instance Segmentation [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) :1483-1498
[3]  
Chen K., 2019, arXiv:1906.07155
[4]  
Child R., 2019, CORR
[5]  
Dai JF, 2016, ADV NEUR IN, V29
[6]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[7]   Object Detection with Discriminatively Trained Part-Based Models [J].
Felzenszwalb, Pedro F. ;
Girshick, Ross B. ;
McAllester, David ;
Ramanan, Deva .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) :1627-1645
[8]  
He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]   Relation Networks for Object Detection [J].
Hu, Han ;
Gu, Jiayuan ;
Zhang, Zheng ;
Dai, Jifeng ;
Wei, Yichen .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3588-3597