Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images

被引:5
作者
Zhou, Jie [1 ]
Yang, Degang [1 ,2 ]
Song, Tingting [1 ]
Ye, Yichen [3 ]
Zhang, Xin [1 ]
Song, Yingze [1 ]
机构
[1] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China
[2] Chongqing Engn Res Ctr Educ Big Data Intelligent P, Chongqing 401331, Peoples R China
[3] Southwest Univ, Coll Elect & Informat Engn, Chongqing 400715, Peoples R China
关键词
Fisheye image; YOLOv7; Modulated deformable convolution; Swin transformer; Object detection; VISUAL-PERCEPTION; NETWORK;
D O I
10.1016/j.imavis.2024.104966
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thanks to the wide view field, the fisheye camera can get much more visual information. Thus, it is widely used in the field of computer vision. However, projection is often required for fisheye images to be used for object detection. Meanwhile, the projection will lead to distortion in fisheye images, and the discontinuous image edges will make the objects incomplete. Fisheye images are characterized by objects that are large near and small far. These problems are still challenges for the existing advanced object detector YOLOv7. Therefore, in this paper, we propose an improved YOLOv7 model. First, Modulated Deformable Convolution is introduced into the YOLOv7 model to automatically adapt to distortion changes of distorted objects in fisheye images. It not only adjusts the sampling position of the convolutional kernel but also further extends the deformation range. The improved model can efficiently extract features of distorted and edge -discontinuous objects. In addition, fisheye images are characterized by objects close to the fisheye lens being large, while objects farther away from the fisheye lens will be smaller. To further optimize the detection performance of small objects in fisheye images, Swin Transformer is also introduced into the YOLOv7 model, and Swin Transformer Block with Window Multihead Self -Attention (W-MSA) Effectively enhances Network Local Perception. Finally, our proposed model achieves up to 2.4% improvement in mAP compared to the original YOLOv7 model on the ERP-360 dataset. Also, the proposed model achieves the best results compared to other state-of-the-art object detection methods for equirectangular projection images. On the VOC-360 dataset, our proposed model improves the mAP by up to 5.9% compared to the original YOLOv7 model. The experimental results show that the proposed models achieve good results for object detection in both fisheye images and equirectangular projection images. The ERP-360 dataset, source code and pre -trained models for related tasks can be found at https://github.com/xiaoxi aomichong/ERP-360dataset.
引用
收藏
页数:13
相关论文
共 45 条
[1]   RotInvMTL: Rotation Invariant MultiNet on Fisheye Images for Autonomous Driving Applications [J].
Arsenali, Bruno ;
Viswanath, Prashanth ;
Novosel, Jelena .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :2373-2382
[2]   Early Fire Detection Based on Aerial 360-Degree Sensors, Deep Convolution Neural Networks and Exploitation of Fire Dynamic Textures [J].
Barmpoutis, Panagiotis ;
Stathaki, Tania ;
Dimitropoulos, Kosmas ;
Grammalidis, Nikos .
REMOTE SENSING, 2020, 12 (19) :1-17
[3]   PanoraMIS: An ultra-wide field of view image dataset for vision-based robot-motion estimation [J].
Benseddik, Houssem-Eddine ;
Morbidi, Fabio ;
Caron, Guillaume .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (09) :1037-1051
[4]   OmniPhotos: Casual 360° VR Photography [J].
Bertel, Tobias ;
Yuan, Mingze ;
Lindroos, Reuben ;
Richardt, Christian .
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06)
[5]   SilhoNet-Fisheye: Adaptation of A ROI Based Object Pose Estimation Network to Monocular Fisheye Images [J].
Billings, Gideon ;
Johnson-Roberson, Matthew .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (03) :4241-4248
[6]  
Bo-Hong Lin, 2020, 2020 International Conference on Pervasive Artificial Intelligence (ICPAI), P194, DOI 10.1109/ICPAI51961.2020.00043
[7]  
Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
[8]   Swin-Transformer-Based YOLOv5 for Small-Object Detection in Remote Sensing Images [J].
Cao, Xuan ;
Zhang, Yanwei ;
Lang, Song ;
Gong, Yan .
SENSORS, 2023, 23 (07)
[9]  
Chen PY, 2019, IEEE IMAGE PROC, P2956, DOI [10.1109/ICIP.2019.8803719, 10.1109/icip.2019.8803719]
[10]   Efficient pedestrian detection in top-view fisheye images using compositions of perspective view patches [J].
Chiang, Sheng-Ho ;
Wang, Tsaipei ;
Chen, Yi-Fu .
IMAGE AND VISION COMPUTING, 2021, 105