A CNN-Transformer Hybrid Model Based on CSWin Transformer for UAV Image Object Detection

被引:41
作者
Lu, Wanjie [1 ]
Lan, Chaozhen [2 ]
Niu, Chaoyang [1 ]
Liu, Wei [1 ]
Lyu, Liang [2 ]
Shi, Qunshan [2 ]
Wang, Shiju [1 ]
机构
[1] PLA Strateg Support Force Informat Engn Univ, Inst Data & Target Engn, Zhengzhou 450001, Peoples R China
[2] PLA Strateg Support Force Informat Engn Univ, Inst Geospatial Informat, Zhengzhou 450001, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformers; Feature extraction; Detectors; Autonomous aerial vehicles; Computational modeling; Training; Convolutional neural network (CNN); hybrid network; object detection; transformer; unmanned aerial vehicle (UAV) image; NETWORK;
D O I
10.1109/JSTARS.2023.3234161
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The object detection of unmanned aerial vehicle (UAV) images has widespread applications in numerous fields; however, the complex background, diverse scales, and uneven distribution of objects in UAV images make object detection a challenging task. This study proposes a convolution neural network transformer hybrid model to achieve efficient object detection in UAV images, which has three advantages that contribute to improving object detection performance. First, the efficient and effective cross-shaped window (CSWin) transformer can be used as a backbone to obtain image features at different levels, and the obtained features can be input into the feature pyramid network to achieve multiscale representation, which will contribute to multiscale object detection. Second, a hybrid patch embedding module is constructed to extract and utilize low-level information such as the edges and corners of the image. Finally, a slicing-based inference method is constructed to fuse the inference results of the original image and sliced images, which will improve the small object detection accuracy without modifying the original network. Experimental results on public datasets illustrate that the proposed method can improve performance more effectively than several popular and state-of-the-art object detection methods.
引用
收藏
页码:1211 / 1231
页数:21
相关论文
共 50 条
  • [31] A CNN-transformer hybrid approach for an intrusion detection system in advanced metering infrastructure
    Yao, Ruizhe
    Wang, Ning
    Chen, Peng
    Ma, Di
    Sheng, Xianjun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (13) : 19463 - 19486
  • [32] CNN-TransNet: A Hybrid CNN-Transformer Network With Differential Feature Enhancement for Cloud Detection
    Ma, Nan
    Sun, Lin
    He, Yawen
    Zhou, Chenghu
    Dong, Chuanxiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [33] HCT-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation
    Zhihong Yu
    Feifei Lee
    Qiu Chen
    Applied Intelligence, 2023, 53 : 19990 - 20006
  • [34] HCT-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation
    Yu, Zhihong
    Lee, Feifei
    Chen, Qiu
    APPLIED INTELLIGENCE, 2023, 53 (17) : 19990 - 20006
  • [35] Rethinking Image Deblurring via CNN-Transformer Multiscale Hybrid Architecture
    Zhao, Qian
    Yang, Hao
    Zhou, Dongming
    Cao, Jinde
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [36] An Object Detection Model for Power Lines With Occlusions Combining CNN and Transformer
    Shi, Weicheng
    Lyu, Xiaoqin
    Han, Lei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [37] Image Deblurring Based on an Improved CNN-Transformer Combination Network
    Chen, Xiaolin
    Wan, Yuanyuan
    Wang, Donghe
    Wang, Yuqing
    APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [38] CMTNet: a hybrid CNN-transformer network for UAV-based hyperspectral crop classification in precision agriculture
    Xihong Guo
    Quan Feng
    Faxu Guo
    Scientific Reports, 15 (1)
  • [39] Remote sensing object detection based on a combination of a CNN and the Swin transformer
    Yang, Liu
    Liang, Junhong
    Guo, Liang
    Long, Yang
    Ding, Kaiyan
    He, Qingfang
    Zhang, Zhihang
    REMOTE SENSING LETTERS, 2023, 14 (05) : 450 - 460
  • [40] A novel hybrid CNN-Transformer model for EEG Motor Imagery classification
    Ma, Yaxin
    Song, Yonghao
    Gao, Fei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,