A deep neural network for small object detection in complex environments with unmanned aerial vehicle imagery

被引：0

作者：

Jobaer, Sayed ^{[1
]}

Tang, Xue-song ^{[1
,2
]}

Zhang, Yihong ^{[1
,2
]}

机构：

[1] Donghua Univ, Coll Informat Sci & Technol, Shanghai 201620, Peoples R China

[2] Donghua Univ, Coll Informat Sci & Technol, Engn Res Ctr Digitized Text & Apparel Technol, Minist Educ, Shanghai 201620, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2025年 / 148卷

基金：

中国国家自然科学基金; 上海市自然科学基金;

关键词：

Small object detection; Deep learning; Computer vision; Unmanned aerial vehicle; Image processing;

D O I：

10.1016/j.engappai.2025.110466

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning-based object detectors perform effectively on edge devices but encounter challenges with small and flat objects in complex environments, especially under low-light conditions and in high-altitude images captured by unmanned aerial vehicles (UAVs). The primary issue is the pixel similarity between objects and their backgrounds, making detection challenging. While existing detectors struggle to detect small and flat objects in these scenarios, the advent of you only look once (YOLO) algorithms have shown promise. However, they still have limitations in detecting small and flat objects under these conditions. Due to a shortage of suitable datasets covering complex environments and lighting conditions, the field lacks comprehensive research on detecting small and flat objects in UAV-assisted images. To address these issues, we develop a dataset with nine classes tailored to small object detection (SOD) challenges. We propose a dynamic model based on the you only look once network v5 (version 6.2) architecture to overcome the above-mentioned limitations. We introduce the Luna-enhancement mechanism and four novel modules, which enhance the detector's capacity to detect objects in complex environments. Our approach aims to improve the accuracy and robustness of detecting small and flat objects in complex environments, benefiting applications like aerial surveillance, search and rescue, and autonomous navigation. The experimental results demonstrate that our proposed model achieves a mean average precision (mAP_0.5) of 74.8% on the common objects in context (COCO) dataset, 76.3% on the VisDrone2019 dataset, 90.6% on the dataset for object detection in aerial images (DOTA-v1.5) dataset, and 71.5% on our SODDataset, with improvements of 7.7%, 6.9%, 4.4% and 10.9%, respectively. For mAP_0.5:0.95, the model achieves 57.2%, 58.2%, 68.2%, and 51.7% on the COCO, VisDrone2019, DOTA-v1.5, and SOD-Dataset, with improvements of 5.5%, 16.4%, 3.4%, and 12.1% compared to the baseline algorithm. Furthermore, ablation experiments and visualization analysis provide additional evidence of the importance of each model component. The code and dataset are publicly available at https://github.com/dhuvisionlab/YOLO-SOD.

引用

页数：25

共 96 条

[1]

Aharon Shay, 2023, Zenodo

[2] Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review [J].

Amjoud, Ayoub Benali ;

Amrouch, Mustapha .

IEEE ACCESS, 2023, 11 :35479-35516

[3] Oil and gas flow anomaly detection on offshore naturally flowing wells using deep neural networks [J].

Bayazitova, Guzel ;

Anastasiadou, Maria ;

Santos, Vitor Duarte dos .

GEOENERGY SCIENCE AND ENGINEERING, 2024, 242

[4]

bennamoun M., 2023, Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art, P1

[5]

Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934

[6]

Cabon Y., 2020, Virtual KITTI, V2, P1

[7] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[8]

Carion N, 2020, Img Proc Comp Vis Re, V12346, P213, DOI 10.1007/978-3-030-58452-8_13

[9] Multi-class geospatial object detection and geographic image classification based on collection of part detectors [J].

Cheng, Gong ;

Han, Junwei ;

Zhou, Peicheng ;

Guo, Lei .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2014, 98 :119-132

[10] A Global-Local Self-Adaptive Network for Drone-View Object Detection [J].

Deng, Sutao ;

Li, Shuai ;

Xie, Ke ;

Song, Wenfeng ;

Liao, Xiao ;

Hao, Aimin ;

Qin, Hong .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :1556-1569

← 1 2 3 4 5 6 7 8 9 10 →