TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios

被引：1377

作者：

Zhu, Xingkui ^{[1
]}

Lyu, Shuchang ^{[1
]}

Wang, Xu ^{[1
]}

Zhao, Qi ^{[1
]}

机构：

[1] Beihang Univ, Beijing, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCVW54120.2021.00312

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens the optimization of networks. Moreover, high-speed and low-altitude flight bring in the motion blur on the densely packed objects, which leads to great challenge of object distinction. To solve the two issues mentioned above, we propose TPH-YOLOv5. Based on YOLOv5, we add one more prediction head to detect different-scale objects. Then we replace the original prediction heads with Transformer Prediction Heads (TPH) to explore the prediction potential with self-attention mechanism. We also integrate convolutional block attention model (CBAM) to find attention region on scenarios with dense objects. To achieve more improvement of our proposed TPH-YOLOv5, we provide bags of useful strategies such as data augmentation, multiscale testing, multi-model integration and utilizing extra classifier. Extensive experiments on dataset VisDrone2021 show that TPH-YOLOv5 have good performance with impressive interpretability on drone-captured scenarios. On DET-test-challenge dataset, the AP result of TPH-YOLOv5 are 39.18%, which is better than previous SOTA method (DPNetV3) by 1.81%. On VisDrone Challenge 2021, TPHYOLOv5 wins 5 th place and achieves well-matched results with 1s t place model (AP 39.43%). Compared to baseline model (YOLOv5), TPH-YOLOv5 improves about 7%, which is encouraging and competitive.

引用

页码：2778 / 2788

页数：11

共 60 条

[1]

[Anonymous], 2010, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4

[2]

[Anonymous], 2018, ARXIV180406215

[3] Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks [J].

Audebert, Nicolas ;

Le Saux, Bertrand ;

Lefevre, Sebastien .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 :20-32

[4]

Bochkovskiy A, 2020, Yolov4: optimal speed and accuracy of object detection, DOI 10.48550/ARXIV.2004.10934

[5]

Chen Chen, 2019, P IEEE CVF INT C COM

[6] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[7] VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results [J].

Du, Dawei ;

Zhu, Pengfei ;

Wen, Longyin ;

Bian, Xiao ;

Ling, Haibin ;

Hu, Qinghua ;

Peng, Tao ;

Zheng, Jiayu ;

Wang, Xinyao ;

Zhang, Yue ;

Bo, Liefeng ;

Shi, Hailin ;

Zhu, Rui ;

Kumar, Aashish ;

Li, Aijin ;

Zinollayev, Almaz ;

Askergaliyev, Anuar ;

Schumann, Arne ;

Mao, Binjie ;

Lee, Byeongwon ;

Liu, Chang ;

Chen, Changrui ;

Pan, Chunhong ;

Huo, Chunlei ;

Yu, Da ;

Cong, Dechun ;

Zeng, Dening ;

Pailla, Dheeraj Reddy ;

Li, Di ;

Wang, Dong ;

Cho, Donghyeon ;

Zhang, Dongyu ;

Bai, Furui ;

Jose, George ;

Gao, Guangyu ;

Liu, Guizhong ;

Xiong, Haitao ;

Qi, Hao ;

Wang, Haoran ;

Qiu, Heqian ;

Li, Hongliang ;

Lu, Huchuan ;

Kim, Ildoo ;

Kim, Jaekyum ;

Shen, Jane ;

Lee, Jihoon ;

Ge, Jing ;

Xu, Jingjing ;

Zhou, Jingkai ;

Meier, Jonas .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :213-226

[8]

Dosovitskiy A., 2021, P 9 INT C LEARN REPR, DOI [10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]

[9]

Dosovitskiy Alexey, 2021, 9 INT C LEARN REPR I

[10]

Ge Zheng, 2021, YOLOX: exceeding YOLO series, DOI 10.48550/ARXIV.2107.08430

← 1 2 3 4 5 6 →