Object Detection Method Based on Saliency Map Fusion for UAV-borne Thermal Images

被引:0
作者
Zhao X.-K. [1 ]
Li M. [1 ]
Zhang G. [1 ]
Li N. [1 ]
Li J.-S. [1 ]
机构
[1] College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing
来源
Zidonghua Xuebao/Acta Automatica Sinica | 2021年 / 47卷 / 09期
基金
中国国家自然科学基金;
关键词
Object detection; Saliency map; Thermal image; Unmanned aerial vehicles (UAV); YOLOv3-MobileNetv2;
D O I
10.16383/j.aas.c200021
中图分类号
学科分类号
摘要
Using thermal images obtained from unmanned aerial vehicles (UAV) for pedestrian and vehicle detection has great potential in the fields of traffic monitoring, intelligent security, disaster prevention, and emergency response. Thermal images can clearly observe objects at night or under bad lighting conditions, but they also have the disadvantages of low contrast and weak texture features. For these reasons, this paper proposes to use the saliency map of the thermal image for image enhancement as the attention mechanism of the object detector. The technology to improve the performance of object detection using only thermal images and their saliency maps is studied. In addition, considering the computing power of UAV platforms, a lightweight network YOLOv3-MobileNetv2 was designed as the object detection model. In the paper, YOLOv3 network is trained as a detection benchmark; BASNet is used to generate saliency maps. We fuse thermal images with their corresponding saliency maps through channel replacement and pixel-level weighted fusion schemes. In our experiments, the detection performances of YOLOv3-MobileNetv2 model with different schemes are compared. The statistical results show that the average precision (AP) of pedestrians and vehicles are increased by 6.7% and 5.7% respectively, compared with the benchmark. The detection speed is increased by 60%, while the model size is reduced by 58%. This model provides reliable technical support for the application of thermal images with UAV platforms. Copyright © 2021 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:2120 / 2131
页数:11
相关论文
共 31 条
[1]  
Liu Zhi-Jia, Jia Peng, Xia Yin-Hui, Lin Yu, Xu Chang-Bin, Development and performance evaluation of infrared and visual image fusion technology, Laser and Infrared, 49, 5, pp. 633-640, (2019)
[2]  
Koch C, Ullman S, Shifts in selective visual attention: Towards the underlying neural circuitry, Human Neurobiology, 4, 4, pp. 219-227, (1985)
[3]  
Redmon J, Farhadi A., YOLOv3: An incremental improvement
[4]  
Qin X B, Zhang Z C, Huang C Y, Gao C, Dehghan M, Jagersand M., BASNet: Boundary-aware salient object detection, Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7479-7489, (2019)
[5]  
Sandler M, Howard A, Zhu M L, Zhmoginov A, Chen L C., MobileNetV2: Inverted residuals and linear bottlenecks, Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510-4520, (2018)
[6]  
Lin T Y, Goyal P, Girshick R, He K M, Dollar P, Focal loss for dense object detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2, pp. 318-327, (2020)
[7]  
Lecun Y, Bottou L, Bengio Y, Haffner P, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86, 11, pp. 2278-2324, (1998)
[8]  
Yu Xue-Song, Liu Jia-Feng, Tang Xiang-Long, Huang Jian-Hua, Estimating the pedestrian 3D motion indoor via hybrid tracking model, Acta Automatica Sinica, 36, 4, pp. 610-615, (2010)
[9]  
Dollar P, Tu Z W, Perona P, Belongie S., Integral channel features, Proceedings of the 2009 British Machine Vision Conference (BMVC), pp. 911-9111, (2009)
[10]  
Girshick R, Donahue J, Darrell T, Malik J., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, (2014)