CDYL for infrared and visible light image dense small object detection

被引:11
作者
Wu, Huixin [1 ]
Zhu, Yang [1 ]
Li, Shuqi [2 ]
机构
[1] North China Univ Water Resources & Elect Power, Zhengzhou 450046, Peoples R China
[2] Xian Int Studies Univ, Xian 710128, Peoples R China
关键词
Object detection; Infrared and visible light; YOLOv8; Computer vision and FliR_Adas_v2 dataset;
D O I
10.1038/s41598-024-54146-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To address the phenomenon of many small and hard-to-detect objects in infrared and visible light images, we propose an object detection algorithm CDYL (Convolution to Fully Connect-ed-Deformable Convolution You only Look once) based on the CFC-DC (Convolution to Fully Connected-Deformable Convolution) module. The core operator of CDYL is CFC-DC, making our model not only have a large effective receptive field in infrared and visible light images, but also have adaptive spatial aggregation conditioned by input and task information. As a result, the CDYL reduces the strict inductive bias of traditional CNNs and has long-range dependence for large kernel convolution as well as adaptive spatial aggregation, deeply mining of edge and correlation information in images to enhance sensitivity to small objects, thereby improving performance in dense small object detection tasks. In order to improve the ability of the CFC-DC module to perceive the detailed information of the image, we use the Mish activation function, which has a wider minima which improves the generalization. The effectiveness as well as the generalization of CDYL is evaluated on an infrared image dataset and an UAV image dataset, and it is compared with other state-of-the-art object detection algorithms. Compared to the baseline network YOLOv8l, our model achieved a 3.0% improvement in mAP0.5 in infrared image detection tasks and a 1.1% improvement in mAP0.5 in visible light image detection tasks. The experimental results show that the proposed algorithm achieves superior average precision values on both infrared and visible light images, while maintaining a light weight. Code is publicly available at https://github.com/yangzhu1/CDYL.
引用
收藏
页数:14
相关论文
共 45 条
[1]  
Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934, DOI 10.48550/ARXIV.2004.10934]
[2]  
Chen L.-C, 2017, P IEEE C COMP VIS PA
[3]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[4]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[5]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[6]  
Dai Z, 2021, ADV NEUR IN, V34
[7]   Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [J].
Ding, Xiaohan ;
Zhang, Xiangyu ;
Han, Jungong ;
Ding, Guiguang .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11953-11965
[8]   CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows [J].
Dong, Xiaoyi ;
Bao, Jianmin ;
Chen, Dongdong ;
Zhang, Weiming ;
Yu, Nenghai ;
Yuan, Lu ;
Chen, Dong ;
Guo, Baining .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :12114-12124
[9]  
Dosovitskiy A, 2021, INT C LEARN REPR ICL
[10]   VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results [J].
Du, Dawei ;
Zhu, Pengfei ;
Wen, Longyin ;
Bian, Xiao ;
Ling, Haibin ;
Hu, Qinghua ;
Peng, Tao ;
Zheng, Jiayu ;
Wang, Xinyao ;
Zhang, Yue ;
Bo, Liefeng ;
Shi, Hailin ;
Zhu, Rui ;
Kumar, Aashish ;
Li, Aijin ;
Zinollayev, Almaz ;
Askergaliyev, Anuar ;
Schumann, Arne ;
Mao, Binjie ;
Lee, Byeongwon ;
Liu, Chang ;
Chen, Changrui ;
Pan, Chunhong ;
Huo, Chunlei ;
Yu, Da ;
Cong, Dechun ;
Zeng, Dening ;
Pailla, Dheeraj Reddy ;
Li, Di ;
Wang, Dong ;
Cho, Donghyeon ;
Zhang, Dongyu ;
Bai, Furui ;
Jose, George ;
Gao, Guangyu ;
Liu, Guizhong ;
Xiong, Haitao ;
Qi, Hao ;
Wang, Haoran ;
Qiu, Heqian ;
Li, Hongliang ;
Lu, Huchuan ;
Kim, Ildoo ;
Kim, Jaekyum ;
Shen, Jane ;
Lee, Jihoon ;
Ge, Jing ;
Xu, Jingjing ;
Zhou, Jingkai ;
Meier, Jonas .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :213-226