TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios

被引:1377
作者
Zhu, Xingkui [1 ]
Lyu, Shuchang [1 ]
Wang, Xu [1 ]
Zhao, Qi [1 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021) | 2021年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCVW54120.2021.00312
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens the optimization of networks. Moreover, high-speed and low-altitude flight bring in the motion blur on the densely packed objects, which leads to great challenge of object distinction. To solve the two issues mentioned above, we propose TPH-YOLOv5. Based on YOLOv5, we add one more prediction head to detect different-scale objects. Then we replace the original prediction heads with Transformer Prediction Heads (TPH) to explore the prediction potential with self-attention mechanism. We also integrate convolutional block attention model (CBAM) to find attention region on scenarios with dense objects. To achieve more improvement of our proposed TPH-YOLOv5, we provide bags of useful strategies such as data augmentation, multiscale testing, multi-model integration and utilizing extra classifier. Extensive experiments on dataset VisDrone2021 show that TPH-YOLOv5 have good performance with impressive interpretability on drone-captured scenarios. On DET-test-challenge dataset, the AP result of TPH-YOLOv5 are 39.18%, which is better than previous SOTA method (DPNetV3) by 1.81%. On VisDrone Challenge 2021, TPHYOLOv5 wins 5 th place and achieves well-matched results with 1s t place model (AP 39.43%). Compared to baseline model (YOLOv5), TPH-YOLOv5 improves about 7%, which is encouraging and competitive.
引用
收藏
页码:2778 / 2788
页数:11
相关论文
共 60 条
[1]  
[Anonymous], 2010, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4
[2]  
[Anonymous], 2018, ARXIV180406215
[3]   Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks [J].
Audebert, Nicolas ;
Le Saux, Bertrand ;
Lefevre, Sebastien .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 :20-32
[4]  
Bochkovskiy A, 2020, Yolov4: optimal speed and accuracy of object detection, DOI 10.48550/ARXIV.2004.10934
[5]  
Chen Chen, 2019, P IEEE CVF INT C COM
[6]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[7]   VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results [J].
Du, Dawei ;
Zhu, Pengfei ;
Wen, Longyin ;
Bian, Xiao ;
Ling, Haibin ;
Hu, Qinghua ;
Peng, Tao ;
Zheng, Jiayu ;
Wang, Xinyao ;
Zhang, Yue ;
Bo, Liefeng ;
Shi, Hailin ;
Zhu, Rui ;
Kumar, Aashish ;
Li, Aijin ;
Zinollayev, Almaz ;
Askergaliyev, Anuar ;
Schumann, Arne ;
Mao, Binjie ;
Lee, Byeongwon ;
Liu, Chang ;
Chen, Changrui ;
Pan, Chunhong ;
Huo, Chunlei ;
Yu, Da ;
Cong, Dechun ;
Zeng, Dening ;
Pailla, Dheeraj Reddy ;
Li, Di ;
Wang, Dong ;
Cho, Donghyeon ;
Zhang, Dongyu ;
Bai, Furui ;
Jose, George ;
Gao, Guangyu ;
Liu, Guizhong ;
Xiong, Haitao ;
Qi, Hao ;
Wang, Haoran ;
Qiu, Heqian ;
Li, Hongliang ;
Lu, Huchuan ;
Kim, Ildoo ;
Kim, Jaekyum ;
Shen, Jane ;
Lee, Jihoon ;
Ge, Jing ;
Xu, Jingjing ;
Zhou, Jingkai ;
Meier, Jonas .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :213-226
[8]  
Dosovitskiy A., 2021, P 9 INT C LEARN REPR, DOI [10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]
[9]  
Dosovitskiy Alexey, 2021, 9 INT C LEARN REPR I
[10]  
Ge Zheng, 2021, YOLOX: exceeding YOLO series, DOI 10.48550/ARXIV.2107.08430