A CNN-Transformer Hybrid Model Based on CSWin Transformer for UAV Image Object Detection

被引:41
作者
Lu, Wanjie [1 ]
Lan, Chaozhen [2 ]
Niu, Chaoyang [1 ]
Liu, Wei [1 ]
Lyu, Liang [2 ]
Shi, Qunshan [2 ]
Wang, Shiju [1 ]
机构
[1] PLA Strateg Support Force Informat Engn Univ, Inst Data & Target Engn, Zhengzhou 450001, Peoples R China
[2] PLA Strateg Support Force Informat Engn Univ, Inst Geospatial Informat, Zhengzhou 450001, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformers; Feature extraction; Detectors; Autonomous aerial vehicles; Computational modeling; Training; Convolutional neural network (CNN); hybrid network; object detection; transformer; unmanned aerial vehicle (UAV) image; NETWORK;
D O I
10.1109/JSTARS.2023.3234161
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The object detection of unmanned aerial vehicle (UAV) images has widespread applications in numerous fields; however, the complex background, diverse scales, and uneven distribution of objects in UAV images make object detection a challenging task. This study proposes a convolution neural network transformer hybrid model to achieve efficient object detection in UAV images, which has three advantages that contribute to improving object detection performance. First, the efficient and effective cross-shaped window (CSWin) transformer can be used as a backbone to obtain image features at different levels, and the obtained features can be input into the feature pyramid network to achieve multiscale representation, which will contribute to multiscale object detection. Second, a hybrid patch embedding module is constructed to extract and utilize low-level information such as the edges and corners of the image. Finally, a slicing-based inference method is constructed to fuse the inference results of the original image and sliced images, which will improve the small object detection accuracy without modifying the original network. Experimental results on public datasets illustrate that the proposed method can improve performance more effectively than several popular and state-of-the-art object detection methods.
引用
收藏
页码:1211 / 1231
页数:21
相关论文
共 50 条
  • [21] Hybrid CNN-Transformer Model for Accurate Impacted Tooth Detection in Panoramic Radiographs
    Kucuk, Deniz Bora
    Imak, Andac
    Ozcelik, Salih Taha Alperen
    Celebi, Adalet
    Turkoglu, Muammer
    Sengur, Abdulkadir
    Koundal, Deepika
    DIAGNOSTICS, 2025, 15 (03)
  • [22] TransHSI: A Hybrid CNN-Transformer Method for Disjoint Sample-Based Hyperspectral Image Classification
    Zhang, Ping
    Yu, Haiyang
    Li, Pengao
    Wang, Ruili
    REMOTE SENSING, 2023, 15 (22)
  • [23] A CNN-Transformer Combined Remote Sensing Imagery Spatiotemporal Fusion Model
    Jiang, Mingyu
    Shao, Hua
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 13995 - 14009
  • [24] Hybrid Multiscale SAR Ship Detector With CNN-Transformer and Adaptive Fusion Loss
    Wang, Fei
    Chen, Chengcheng
    Zeng, Weiming
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [25] Pairwise CNN-Transformer Features for Human-Object Interaction Detection
    Quan, Hutuo
    Lai, Huicheng
    Gao, Guxue
    Ma, Jun
    Li, Junkai
    Chen, Dongji
    ENTROPY, 2024, 26 (03)
  • [26] Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer
    Wang, Hongmei
    Li, Lin
    Li, Chenkai
    Lu, Xuanyu
    IEEE ACCESS, 2023, 11 : 78956 - 78969
  • [27] Hybrid CNN-Transformer model for medical image segmentation with pyramid convolution and multi-layer perceptron
    Liu, Xiaowei
    Hu, Yikun
    Chen, Jianguo
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 86
  • [28] Hybrid CNN-Transformer Architecture With Xception-Based Feature Enhancement for Accurate Breast Cancer Classification
    Zeynali, Alireza
    Tinati, Mohammad Ali
    Tazehkand, Behzad Mozaffari
    IEEE ACCESS, 2024, 12 : 189477 - 189493
  • [29] HCTNet: A hybrid CNN-transformer network for breast ultrasound image segmentation
    He, Qiqi
    Yang, Qiuju
    Xie, Minghao
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 155
  • [30] Robust Image Forgery Localization Using Hybrid CNN-Transformer Synergy Based Framework
    Sharma, Sachin
    Singh, Brajesh Kumar
    Garg, Hitendra
    CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (03): : 4691 - 4708