A Decoupled YOLOv5 with Deformable Convolution and Multi-scale Attention

被引：2

作者：

Yuan, Gui ^{[1
]}

Liu, Gang ^{[1
]}

Chen, Jian ^{[1
]}

机构：

[1] Hubei Univ Technol, Sch Comp, Wuhan 430068, Peoples R China

来源：

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I | 2022年 / 13368卷

基金：

中国国家自然科学基金;

关键词：

Object detection; Decoupled head; Deformable convolution; Multi-scale attention; YOLOv5;

D O I：

10.1007/978-3-031-10983-6_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

YOLO series are very classic detection frameworks in the field of object detection, and they have achieved remarkable results on general datasets. Among them, YOLOv5, as a single-stage multi-scale detector, has great advantages in accuracy and speed, but it still has the problem of inaccuracy localization when detecting the objects. In order to solve this problem, we propose three methods to improve YOLOv5. First, due to the conflict between classification and regression tasks, the classification and the localization in the detection head in our method are decoupled. Secondly, because the feature fusion method used by YOLOv5 can cause the problem of feature alignment, we added the deformable convolution to automatically align the features of different scales. Finally, we added the proposed multi-scale attention mechanism to the features of adjacent scales to predict a relative weighting between adjacent scales. Experiments show that our method on the PASCAL VOC dataset can obtain a mAP0.5 of 85.11% and a mAP0.5:0.95 of 63.33%.

引用

页码：3 / 14

页数：12

共 30 条

[1]

Bochkovskiy A., 2020, PREPRINT

[2] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[3] You Only Look One-level Feature [J].

Chen, Qiang ;

Wang, Yingming ;

Yang, Tong ;

Zhang, Xiangyu ;

Cheng, Jian ;

Sun, Jian .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13034-13043

[4] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[5] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[6] CenterNet: Keypoint Triplets for Object Detection [J].

Duan, Kaiwen ;

Bai, Song ;

Xie, Lingxi ;

Qi, Honggang ;

Huang, Qingming ;

Tian, Qi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6568-6577

[7]

Ge Z., 2021, arXiv, DOI 10.48550/ARXIV.2107.08430

[8] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[9]

Hardalac F., 2021, ARXIV

[10] CornerNet: Detecting Objects as Paired Keypoints [J].

Law, Hei ;

Deng, Jia .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (03) :642-656

← 1 2 3 →