Tiny Object Detection via Regional Cross Self-Attention Network

被引：7

作者：

Cheng, Keyang ^{[1
]}

Cui, Honggang ^{[1
]}

Ghafoor, Humaira Abdul ^{[1
]}

Wan, Hao ^{[1
]}

Mao, Qirong ^{[1
]}

Zhan, Yongzhao ^{[1
]}

机构：

[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Detectors; Object detection; Encoding; Feature extraction; Transformers; Image coding; Generators; Tiny object detection; context aggregation; vision transformer; self-attention; position coding; feature fusion;

D O I：

10.1109/TCSVT.2022.3232688

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As vision sensor technology continues to evolve, the requirements for detecting targets of interest in the images captured by the sensors are increasing. Considering fast detection and high accuracy, the industry favors geometric key point-based solutions. However, there are a large number of small and fuzzy objects in the real world. Geometric key point detectors do not effectively utilize the contextual features of the region of interest, leading to excessive false positive and false negative results. In this work, a simple, effective, and interpretable tiny object detection method called Regional Cross Self-Attention Object Detection Network (RCSANet) is proposed. It adopts Region Proposal Networks and transformers to capture regional background relations and uses regional background relations to generate key point sequences. The regional cross self-attention mechanism is introduced to curtail computation redundancy and minimize the interference of redundant information to the target region. Additionally, a position coding called dynamic implicit position coding is proposed to cooperate with regional cross self-attentiveness. Dynamic implicit location coding can encode arbitrarily long input sequences. The computational cost of RCSANet is significantly lower than that of state-of-the-art object detection solutions. Moreover, RCSANet improves the performance on the four benchmark datasets, of MSCOCO, Tinyperson, DOTA, and AI-TOD, by about 3.0%AP.

引用

页码：8984 / 8996

页数：13

共 59 条

[1] Attention Augmented Convolutional Networks [J].

Bello, Irwan ;

Zoph, Barret ;

Vaswani, Ashish ;

Shlens, Jonathon ;

Le, Quoc V. .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294

[2] All-Higher-Stages-In Adaptive Context Aggregation for Semantic Edge Detection [J].

Bo, Qihan ;

Ma, Wei ;

Lai, Yu-Kun ;

Zha, Hongbin .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) :6778-6791

[3] COCO-Stuff: Thing and Stuff Classes in Context [J].

Caesar, Holger ;

Uijlings, Jasper ;

Ferrari, Vittorio .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1209-1218

[4] High-Level Semantic Networks for Multi-Scale Object Detection [J].

Cao, Jiale ;

Pang, Yanwei ;

Zhao, Shengjie ;

Li, Xuelong .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (10) :3372-3386

[5] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[6] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[7]

Chen RTQ, 2018, 32 C NEURAL INFORM P, V31

[8] Object Detection in High Resolution Remote Sensing Imagery Based on Convolutional Neural Networks With Suitable Object Scale Features [J].

Dong, Zhipeng ;

Wang, Mi ;

Wang, Yanli ;

Zhu, Ying ;

Zhang, Zhiqi .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03) :2104-2114

[9] CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection [J].

Dong, Zhiwei ;

Li, Guoxuan ;

Liao, Yue ;

Wang, Fei ;

Ren, Pengju ;

Qian, Chen .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10516-10525

[10] CenterNet: Keypoint Triplets for Object Detection [J].

Duan, Kaiwen ;

Bai, Song ;

Xie, Lingxi ;

Qi, Honggang ;

Huang, Qingming ;

Tian, Qi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6568-6577

← 1 2 3 4 5 6 →