High-Resolution Feature Pyramid Network for Small Object Detection on Drone View

被引：38

作者：

Chen, Zhaodong ^{[1
,2
]}

Ji, Hongbing ^{[1
,2
]}

Zhang, Yongquan ^{[1
,2
]}

Zhu, Zhigang ^{[1
,2
]}

Li, Yifan ^{[1
,2
]}

机构：

[1] Xidian Univ, Xian Key Lab Intelligent Spectrum Sensing & Inform, Xian 710071, Peoples R China

[2] Xidian Univ, Shaanxi Union Res Ctr Univ & Enterprise Intelligen, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Object detection on drone view; small object detector; high-resolution feature; multiple-in-single-out feature pyramid network; CONTEXT;

D O I：

10.1109/TCSVT.2023.3286896

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Object detection has developed rapidly with the help of deep learning technologies recent years. However, object detection on drone view remains challenging due to two main reasons: (1) It is difficult to detect small-scale objects lacking detailed information. (2) The diversity of camera angles of drones brings dramatic differences in object scale. Although feature pyramid network (FPN) alleviates the problem caused by scale difference to some extent, it also retains some worthless features, which wastes resources and slows down the speed. In this work, we propose a novel High-Resolution Feature Pyramid Network (HR-FPN) to improve the detection accuracy of small-scale objects and avoid feature redundancy. The key components of HR-FPN include a high-resolution feature alignment module (HRFA), a high-resolution feature fusion module (HRFF) and a multi-scale decoupled head (MSDH). HRFA feeds multi-scale features from backbone into parallel resampling channels to obtain high-resolution features at the same scale. HRFF establishes a bottom-up path to distribute context-rich low-level semantic information to all layers that are then aggregated into classification feature and localization feature. MSDH cope with the scale difference of objects by predicting the categories and locations corresponding to different scales of objects separately. Moreover, we train model by scale-weighted loss to focus more on small-scale objects. Extensive experiments and comprehensive evaluations demonstrate the effectiveness and advancement of our approach.

引用

页码：475 / 489

页数：15

共 83 条

[81] An Empirical Study of Spatial Attention Mechanisms in Deep Networks [J].

Zhu, Xizhou ;

Cheng, Dazhi ;

Zhang, Zheng ;

Lin, Stephen ;

Dai, Jifeng .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6687-6696

[82] Complex Convolutional Neural Network for Signal Representation and Its Application to Radar Emitter Recognition [J].

Zhu, Zhigang ;

Ji, Hongbing ;

Zhang, Wenbo ;

Li, Lin ;

Ji, Tenghao .

IEEE COMMUNICATIONS LETTERS, 2023, 27 (03) :856-860

[83] Deep Multimodal Subspace Interactive Mutual Network for Specific Emitter Identification [J].

Zhu, Zhigang ;

Ji, Hongbing ;

Li, Lin .

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2023, 59 (04) :4289-4300

← 1 2 3 4 5 6 7 8 9 →