A Recursive Prediction-Based Feature Enhancement for Small Object Detection

被引：0

作者：

Xiao, Xiang ^{[1
]}

Xue, Xiaorong ^{[1
]}

Zhao, Zhiyuan ^{[1
]}

Fan, Yisheng ^{[1
]}

机构：

[1] Liaoning Univ Technol, Sch Elect & Informat Engn, Jinzhou 121001, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 12期

关键词：

small object detection; SAC; DINO; NWD;

D O I：

10.3390/s24123856

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Transformer-based methodologies in object detection have recently piqued considerable interest and have produced impressive results. DETR, an end-to-end object detection framework, ingeniously integrates the Transformer architecture, traditionally used in NLP, into computer vision for sequence-to-sequence prediction. Its enhanced variant, DINO, featuring improved denoising anchor boxes, has showcased remarkable performance on the COCO val2017 dataset. However, it often encounters challenges when applied to scenarios involving small object detection. Thus, we propose an innovative method for feature enhancement tailored to recursive prediction tasks, with a particular emphasis on augmenting small object detection performance. It primarily involves three enhancements: refining the backbone to favor feature maps that are more sensitive to small targets, incrementally augmenting the number of queries for small objects, and advancing the loss function for better performance. Specifically, The study incorporated the Switchable Atrous Convolution (SAC) mechanism, which features adaptable dilated convolutions, to increment the receptive field and thus elevate the innate feature extraction capabilities of the primary network concerning diminutive objects. Subsequently, a Recursive Small Object Prediction (RSP) module was designed to enhance the feature extraction of the prediction head for more precise network operations. Finally, the loss function was augmented with the Normalized Wasserstein Distance (NWD) metric, tailoring the loss function to suit small object detection better. The efficacy of the proposed model is empirically confirmed via testing on the VISDRONE2019 dataset. The comprehensive array of experiments indicates that our proposed model outperforms the extant DINO model in terms of average precision (AP) small object detection.

引用

页数：16

共 52 条

[1] SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network
Bai, Yancheng
Zhang, Yongqiang
Ding, Mingli
Ghanem, Bernard
[J]. COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 210 - 226
[2] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[3] Enhanced Training of Query-Based Object Detection via Selective Query Recollection
Chen, Fangyi
Zhang, Han
Hu, Kai
Huang, Yu-Kai
Zhu, Chenchen
Savvides, Marios
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23756 - 23765
[4] A Survey of the Four Pillars for Small Object Detection: Multiscale Representation, Contextual Information, Super-Resolution, and Region Proposal
Chen, Guang
Wang, Haitao
Chen, Kai
Li, Zhijun
Song, Zida
Liu, Yinlong
Chen, Wenkai
Knoll, Alois
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (02): : 936 - 953
[5] Dynamic Head: Unifying Object Detection Heads with Attentions
Dai, Xiyang
Chen, Yinpeng
Xiao, Bin
Chen, Dongdong
Liu, Mengchen
Yuan, Lu
Zhang, Lei
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7369 - 7378
[6] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[7] VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results
Du, Dawei
Zhu, Pengfei
Wen, Longyin
Bian, Xiao
Ling, Haibin
Hu, Qinghua
Peng, Tao
Zheng, Jiayu
Wang, Xinyao
Zhang, Yue
Bo, Liefeng
Shi, Hailin
Zhu, Rui
Kumar, Aashish
Li, Aijin
Zinollayev, Almaz
Askergaliyev, Anuar
Schumann, Arne
Mao, Binjie
Lee, Byeongwon
Liu, Chang
Chen, Changrui
Pan, Chunhong
Huo, Chunlei
Yu, Da
Cong, Dechun
Zeng, Dening
Pailla, Dheeraj Reddy
Li, Di
Wang, Dong
Cho, Donghyeon
Zhang, Dongyu
Bai, Furui
Jose, George
Gao, Guangyu
Liu, Guizhong
Xiong, Haitao
Qi, Hao
Wang, Haoran
Qiu, Heqian
Li, Hongliang
Lu, Huchuan
Kim, Ildoo
Kim, Jaekyum
Shen, Jane
Lee, Jihoon
Ge, Jing
Xu, Jingjing
Zhou, Jingkai
Meier, Jonas
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 213 - 226
[8] The Pascal Visual Object Classes (VOC) Challenge
Everingham, Mark
Van Gool, Luc
Williams, Christopher K. I.
Winn, John
Zisserman, Andrew
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
[9] TOOD: Task-aligned One-stage Object Detection
Feng, Chengjian
Zhong, Yujie
Gao, Yu
Scott, Matthew R.
Huang, Weilin
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3490 - 3499
[10] Fast R-CNN
Girshick, Ross
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448

← 1 2 3 4 5 6 →