Doublem-net: multi-scale spatial pyramid pooling-fast and multi-path adaptive feature pyramid network for UAV detection

被引:6
作者
Li, Zhongxu [1 ,2 ]
He, Qihan [2 ]
Zhao, Hong [3 ]
Yang, Wenyuan [1 ,2 ]
机构
[1] Minnan Normal Univ, Fujian Key Lab Granular Comp & Applicat, Zhangzhou 363000, Peoples R China
[2] Minnan Normal Univ, Sch Math & Stat, Zhangzhou 363000, Peoples R China
[3] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Feature pyramid networks; Spatial pyramid pooling; Adaptive spatial fusion; UAV; OBJECT DETECTION;
D O I
10.1007/s13042-024-02278-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unmanned aerial vehicles (UAVs) are extensively applied in military, rescue operations, and traffic detection fields, resulting from their flexibility, low cost, and autonomous flight capabilities. However, due to the drone's flight height and shooting angle, the objects in aerial images are smaller, denser, and more complex than those in general images, triggering an unsatisfactory target detection effect. In this paper, we propose a model for UAV detection called DoubleM-Net, which contains multi-scale spatial pyramid pooling-fast (MS-SPPF) and Multi-Path Adaptive Feature Pyramid Network (MPA-FPN). DoubleM-Net utilizes the MS-SPPF module to extract feature maps of multiple receptive field sizes. Then, the MPA-FPN module first fuses features from every two adjacent scales, followed by a level-by-level interactive fusion of features. First, using the backbone network as the feature extractor, multiple feature maps of different scale ranges are extracted from the input image. Second, the MS-SPPF uses different pooled kernels to repeat multiple pooled operations at various scales to achieve rich multi-perceptive field features. Finally, the MPA-FPN module first incorporates semantic information between each adjacent two-scale layer. The top-level features are then passed back to the bottom level-by-level, and the underlying features are enhanced, enabling interaction and integration of features at different scales. The experimental results show that the mAP50-95 ratio of DoubleM-Net on the VisDrone dataset is 27.5%, and that of Doublem-Net on the DroneVehicle dataset in RGB and Infrared mode is 55.0% and 60.4%, respectively. Our model demonstrates excellent performance in air-to-ground image detection tasks, with exceptional results in detecting small objects.
引用
收藏
页码:5781 / 5805
页数:25
相关论文
共 56 条
[1]  
[Anonymous], 2020, ZENODO, DOI DOI 10.5281/ZENODO.3958273
[2]  
Bochkovskiy A., 2020, CORR
[3]   Cascade R-CNN: High Quality Object Detection and Instance Segmentation [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) :1483-1498
[4]   RRNet: A Hybrid Detector for Object Detection in Drone-captured Images [J].
Chen, Changrui ;
Zhang, Yu ;
Lv, Qingxuan ;
Wei, Shuo ;
Wang, Xiaorui ;
Sun, Xin ;
Dong, Junyu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :100-108
[5]   Disparity-Based Multiscale Fusion Network for Transportation Detection [J].
Chen, Jing ;
Wang, Qichao ;
Peng, Weiming ;
Xu, Haitao ;
Li, Xiaodong ;
Xu, Wenqiang .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) :18855-18863
[6]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[7]   Disentangle Your Dense Object Detector [J].
Chen, Zehui ;
Yang, Chenhongyi ;
Li, Qiaofei ;
Zhao, Feng ;
Zha, Zheng-Jun ;
Wu, Feng .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4939-4948
[8]   Skip Connection YOLO Architecture for Noise Barrier Defect Detection Using UAV-Based Images in High-Speed Railway [J].
Cui, Jing ;
Qin, Yong ;
Wu, Yunpeng ;
Shao, Changhong ;
Yang, Huaizhi .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (11) :12180-12195
[9]   Signal Processing for Low-Power and Low-Cost Radar Systems in Bicycle Safety Applications [J].
Dorn, Christian ;
Kurin, Thomas ;
Erhardt, Stefan ;
Lurz, Fabian ;
Hagelauer, Amelie .
2022 IEEE TOPICAL CONFERENCE ON WIRELESS SENSORS AND SENSOR NETWORKS (WISNET), 2022, :11-13
[10]   TOOD: Task-aligned One-stage Object Detection [J].
Feng, Chengjian ;
Zhong, Yujie ;
Gao, Yu ;
Scott, Matthew R. ;
Huang, Weilin .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :3490-3499