POD-YOLO Object Detection Model Based on Bi-directional Dynamic Cross-level Pyramid Network

被引：0

作者：

Zhang, Yu ^{[1
]}

Ma, Ming ^{[1
]}

Wang, Zhongxiang ^{[1
]}

Li, Jing ^{[1
]}

Sun, Yan ^{[1
]}

机构：

[1] Shenyang Univ Chem Technol, Sch Comp Sci & Technol, Shenyang, Peoples R China

来源：

ENGINEERING LETTERS | 2024年 / 32卷 / 05期

关键词：

Image processing; Object detection; Feature pyramid; Multi-scale; Feature fusion;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

The existing heavy-backbone object detection models overlook the crucial role of cross-level interactive fusion of feature information in pyramid networks, resulting in the inability to detect occluded objects or small objects in complex scenes. In this thesis, we present a new heavy-neck object detection model called POD-YOLO based on YOLOv5s. Firstly, we propose the POD-RepC3 module to increase the model's capability to obtain the multi-layer feature. Additionally, addressing the issue of large object size span, we propose a bidirectional partial dynamic fusion module (Bi-PDC) as the detection neck of the pyramid network. This module preserves the accurate positioning signals and facilitates cross-level interactive fusion of feature information. Finally, we design Reparameterized Bi-directional Dynamic Feature Pyramid Network (RepBi-DFPN), a deep feature fusion network that integrates contextual information and enhances both feature expression and fusion capabilities of our model. The experiment results suggest that the suggested method is positive on the PASCAL VOC dataset. The mAP@0.5 and mAP@0.5 :0.95 performance reached 81.3% and 58.2%, respectively, which increased by 2.4% and 4.1% compared to original algorithm YOLOv5s. Furthermore, experiment results also demonstrate that model's performance can compete with SOTA object detection models. In this paper, the algorithm optimizes the feature fusion capability of the pyramid network to effectively decrease the false detection and missing detection of the model. The model's ability to accurately detect multi-scale targets is significantly improved.

引用

页码：995 / 1003

页数：9

共 25 条

[1]

Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934, DOI 10.48550/ARXIV.2004.10934]

[2]

Cao Jie, 2023, IAENG International Journal of Computer Science, P825

[3] Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks [J].

Chen, Jierun ;

Kao, Shiu-Hong ;

He, Hao ;

Zhuo, Weipeng ;

Wen, Song ;

Lee, Chul-Ho ;

Chan, S. -H. Gary .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :12021-12031

[4] RepVGG: Making VGG-style ConvNets Great Again [J].

Ding, Xiaohan ;

Zhang, Xiangyu ;

Ma, Ningning ;

Han, Jungong ;

Ding, Guiguang ;

Sun, Jian .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13728-13737

[5] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[6] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[7] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[8]

He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]

[9] Densely Connected Convolutional Networks [J].

Huang, Gao ;

Liu, Zhuang ;

van der Maaten, Laurens ;

Weinberger, Kilian Q. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269

[10]

Jiang YQ, 2022, Arxiv, DOI arXiv:2202.04256

← 1 2 3 →