A Recursive Prediction-Based Feature Enhancement for Small Object Detection

被引:0
作者
Xiao, Xiang [1 ]
Xue, Xiaorong [1 ]
Zhao, Zhiyuan [1 ]
Fan, Yisheng [1 ]
机构
[1] Liaoning Univ Technol, Sch Elect & Informat Engn, Jinzhou 121001, Peoples R China
关键词
small object detection; SAC; DINO; NWD;
D O I
10.3390/s24123856
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Transformer-based methodologies in object detection have recently piqued considerable interest and have produced impressive results. DETR, an end-to-end object detection framework, ingeniously integrates the Transformer architecture, traditionally used in NLP, into computer vision for sequence-to-sequence prediction. Its enhanced variant, DINO, featuring improved denoising anchor boxes, has showcased remarkable performance on the COCO val2017 dataset. However, it often encounters challenges when applied to scenarios involving small object detection. Thus, we propose an innovative method for feature enhancement tailored to recursive prediction tasks, with a particular emphasis on augmenting small object detection performance. It primarily involves three enhancements: refining the backbone to favor feature maps that are more sensitive to small targets, incrementally augmenting the number of queries for small objects, and advancing the loss function for better performance. Specifically, The study incorporated the Switchable Atrous Convolution (SAC) mechanism, which features adaptable dilated convolutions, to increment the receptive field and thus elevate the innate feature extraction capabilities of the primary network concerning diminutive objects. Subsequently, a Recursive Small Object Prediction (RSP) module was designed to enhance the feature extraction of the prediction head for more precise network operations. Finally, the loss function was augmented with the Normalized Wasserstein Distance (NWD) metric, tailoring the loss function to suit small object detection better. The efficacy of the proposed model is empirically confirmed via testing on the VISDRONE2019 dataset. The comprehensive array of experiments indicates that our proposed model outperforms the extant DINO model in terms of average precision (AP) small object detection.
引用
收藏
页数:16
相关论文
共 52 条
  • [1] SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network
    Bai, Yancheng
    Zhang, Yongqiang
    Ding, Mingli
    Ghanem, Bernard
    [J]. COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 210 - 226
  • [2] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [3] Enhanced Training of Query-Based Object Detection via Selective Query Recollection
    Chen, Fangyi
    Zhang, Han
    Hu, Kai
    Huang, Yu-Kai
    Zhu, Chenchen
    Savvides, Marios
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23756 - 23765
  • [4] A Survey of the Four Pillars for Small Object Detection: Multiscale Representation, Contextual Information, Super-Resolution, and Region Proposal
    Chen, Guang
    Wang, Haitao
    Chen, Kai
    Li, Zhijun
    Song, Zida
    Liu, Yinlong
    Chen, Wenkai
    Knoll, Alois
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (02): : 936 - 953
  • [5] Dynamic Head: Unifying Object Detection Heads with Attentions
    Dai, Xiyang
    Chen, Yinpeng
    Xiao, Bin
    Chen, Dongdong
    Liu, Mengchen
    Yuan, Lu
    Zhang, Lei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7369 - 7378
  • [6] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [7] VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results
    Du, Dawei
    Zhu, Pengfei
    Wen, Longyin
    Bian, Xiao
    Ling, Haibin
    Hu, Qinghua
    Peng, Tao
    Zheng, Jiayu
    Wang, Xinyao
    Zhang, Yue
    Bo, Liefeng
    Shi, Hailin
    Zhu, Rui
    Kumar, Aashish
    Li, Aijin
    Zinollayev, Almaz
    Askergaliyev, Anuar
    Schumann, Arne
    Mao, Binjie
    Lee, Byeongwon
    Liu, Chang
    Chen, Changrui
    Pan, Chunhong
    Huo, Chunlei
    Yu, Da
    Cong, Dechun
    Zeng, Dening
    Pailla, Dheeraj Reddy
    Li, Di
    Wang, Dong
    Cho, Donghyeon
    Zhang, Dongyu
    Bai, Furui
    Jose, George
    Gao, Guangyu
    Liu, Guizhong
    Xiong, Haitao
    Qi, Hao
    Wang, Haoran
    Qiu, Heqian
    Li, Hongliang
    Lu, Huchuan
    Kim, Ildoo
    Kim, Jaekyum
    Shen, Jane
    Lee, Jihoon
    Ge, Jing
    Xu, Jingjing
    Zhou, Jingkai
    Meier, Jonas
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 213 - 226
  • [8] The Pascal Visual Object Classes (VOC) Challenge
    Everingham, Mark
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
  • [9] TOOD: Task-aligned One-stage Object Detection
    Feng, Chengjian
    Zhong, Yujie
    Gao, Yu
    Scott, Matthew R.
    Huang, Weilin
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3490 - 3499
  • [10] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448