Pareto Refocusing for Drone-View Object Detection

被引：50

作者：

Leng, Jiaxu ^{[1
,2
]}

Mo, Mengjingcheng ^{[1
,2
]}

Zhou, Yinghua ^{[1
,2
]}

Gao, Chenqiang ^{[3
,4
]}

Li, Weisheng ^{[1
,2
]}

Gao, Xinbo ^{[1
,2
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Key Lab Image Cognit, Chongqing 400065, Peoples R China

[2] Chongqing Inst Brain & Intelligence, Guangyang Bay Lab, Chongqing 400065, Peoples R China

[3] Chongqing Univ Posts & Telecommun, Sch Commun & Informat Engn, Chongqing 400065, Peoples R China

[4] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Signal & Informat Proc, Chongqing, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Detectors; Object detection; Task analysis; Feature extraction; Visualization; Drones; Image recognition; Drone-view object detection; pareto refocusing; challenging region prediction; context learning;

D O I：

10.1109/TCSVT.2022.3210207

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Drone-view Object Detection (DOD) is a meaningful but challenging task. It hits a bottleneck due to two main reasons: (1) The high proportion of difficult objects (e.g., small objects, occluded objects, etc.) makes the detection performance unsatisfactory. (2) The unevenly distributed objects make detection inefficient. These two factors also lead to a phenomenon, obeying the Pareto principle, that some challenging regions occupying a low area proportion of the image have a significant impact on the final detection while the vanilla regions occupying the major area have a negligible impact due to the limited room for performance improvement. Motivated by the human visual system that naturally attempts to invest unequal energies in things of hierarchical difficulty for recognizing objects effectively, this paper presents a novel Pareto Refocusing Detection (PRDet) network that distinguishes the challenging regions from the vanilla regions under reverse-attention guidance and refocuses the challenging regions with the assistance of the region-specific context. Specifically, we first propose a Reverse-attention Exploration Module (REM) that excavates the potential position of difficult objects by suppressing the features which are salient to the commonly used detector. Then, we propose a Region-specific Context Learning Module (RCLM) that learns to generate specific contexts for strengthening the understanding of challenging regions. It is noteworthy that the specific context is not shared globally but unique for each challenging region with the exploration of spatial and appearance cues. Extensive experiments and comprehensive evaluations on the VisDrone2021-DET and UAVDT datasets demonstrate that the proposed PRDet can effectively improve the detection performance, especially for those difficult objects, outperforming state-of-the-art detectors. Furthermore, our method also achieves significant performance improvements on the DTU-Drone dataset for power inspection.

引用

页码：1320 / 1334

页数：15

共 73 条

[1] Visual objects in context [J].

Bar, M .

NATURE REVIEWS NEUROSCIENCE, 2004, 5 (08) :617-629

[2] TIDE: A General Toolbox for Identifying Object Detection Errors [J].

Bolya, Daniel ;

Foley, Sean ;

Hays, James ;

Hoffman, Judy .

COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :558-573

[3]

Borji A, 2019, Arxiv, DOI arXiv:1911.12451

[4] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[5] D2Det: Towards High Quality Object Detection and Instance Segmentation [J].

Cao, Jiale ;

Cholakkal, Hisham ;

Anwer, Rao Muhammad ;

Khan, Fahad Shahbaz ;

Pang, Yanwei ;

Shao, Ling .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11482-11491

[6] VisDrone-DET2021: The Vision Meets Drone Object detection Challenge Results [J].

Cao, Yaru ;

He, Zhijian ;

Wang, Lujia ;

Wang, Wenguan ;

Yuan, Yixuan ;

Zhang, Dingwen ;

Zhang, Jinglin ;

Zhu, Pengfei ;

Van Gool, Luc ;

Han, Junwei ;

Hoi, Steven ;

Hu, Qinghua ;

Liu, Ming ;

Cheng, Chong ;

Liu, Fanfan ;

Cao, Guojin ;

Li, Guozhen ;

Wang, Hongkai ;

He, Jianye ;

Wan, Junfeng ;

Wan, Qi ;

Zhao, Qi ;

Lyu, Shuchang ;

Zhao, Wenzhe ;

Lu, Xiaoqiang ;

Zhu, Xingkui ;

Liu, Yingjie ;

Lv, Yixuan ;

Ma, Yujing ;

Yang, Yuting ;

Wang, Zhe ;

Xu, Zhenyu ;

Luo, Zhipeng ;

Zhang, Zhimin ;

Zhang, Zhiguang ;

Li, Zihao ;

Zhang, Zixiao .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :2847-2854

[7] mSODANet: A network for multi-scale object detection in aerial images using hierarchical dilated convolutions * [J].

Chalavadi, Vishnu ;

Jeripothula, Prudviraj ;

Datla, Rajeshreddy ;

Babu, Sobhan Ch ;

Mohan, Krishna C. .

PATTERN RECOGNITION, 2022, 126

[8] RRNet: A Hybrid Detector for Object Detection in Drone-captured Images [J].

Chen, Changrui ;

Zhang, Yu ;

Lv, Qingxuan ;

Wei, Shuo ;

Wang, Xiaorui ;

Sun, Xin ;

Dong, Junyu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :100-108

[9] You Only Look One-level Feature [J].

Chen, Qiang ;

Wang, Yingming ;

Yang, Tong ;

Zhang, Xiangyu ;

Cheng, Jian ;

Sun, Jian .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13034-13043

[10] Reverse Attention for Salient Object Detection [J].

Chen, Shuhan ;

Tan, Xiuli ;

Wang, Ben ;

Hu, Xuelong .

COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :236-252

← 1 2 3 4 5 6 7 8 →