Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network

被引:0
作者
Lingling Li
Changwen Zheng
Cunli Mao
Haibo Deng
Taisong Jin
机构
[1] Zhengzhou University of Aeronautics,School of Intelligent Engineering
[2] Institute of Software,Yunnan Key Laboratory of Artificial Intelligence, Faculty of Information Engineering and Automation
[3] Chinese Academy of Sciences,School of Informatics
[4] Kunming University of Science and Technology,undefined
[5] Beijing Zhonghangzhi Technology Co.,undefined
[6] Ltd.,undefined
[7] Xiamen University,undefined
来源
Neural Processing Letters | 2022年 / 54卷
关键词
Object detection; Feature pyramid; Attention; Convolutional network;
D O I
暂无
中图分类号
学科分类号
摘要
With the progress of deep learning, object detection has attracted great attention in computer vision community. For object detection task, one key challenge is that object scale usually varies in a large range, which may make the existing detectors fail in real applications. To address this problem, we propose a novel end-to-end Attention Feature Pyramid Transformer Network framework to learn the object detectors with multi-scale feature maps via a transformer encoder-decoder fashion. AFPN learns to aggregate pyramid feature maps with attention mechanisms. Specifically, transformer-based attention blocks are used to scan through each spatial location of feature maps in the same pyramid layers and update it by aggregating information from deep to shadow layers. Furthermore, inter-level feature aggregation and intra-level information attention are repeated to encode multi-scale and self-attention feature representation. The extensive experiments on challenging MS COCO object detection dataset demonstrate that the proposed AFPN outperforms its baseline methods, i.e., DETR and Faster R-CNN methods, and achieves the state-of-the-art results.
引用
收藏
页码:581 / 595
页数:14
相关论文
共 20 条
[1]  
Lin X(2016)The distributed system for inverted multi-index visual retrieval Neurocomputing 215 241-249
[2]  
Shen Y(2015)Learning to rank using user clicks and visual features for image retrieval IEEE Trans Cybern 45 767-779
[3]  
Cai L(2014)Click prediction for web image reranking using multimodal sparse coding IEEE Trans Image Process 23 2019-2032
[4]  
Ji R(2015)Faster R-CNN: towards real-time object detection with region proposal networks Adv. Neural Inf Process Syst 28 91-99
[5]  
Yu J(2019)Category-aware spatial constraint for weakly supervised detection IEEE Trans Image Process 29 843-858
[6]  
Tao D(undefined)undefined undefined undefined undefined-undefined
[7]  
Wang M(undefined)undefined undefined undefined undefined-undefined
[8]  
Rui Y(undefined)undefined undefined undefined undefined-undefined
[9]  
Yu J(undefined)undefined undefined undefined undefined-undefined
[10]  
Rui Y(undefined)undefined undefined undefined undefined-undefined