A CNN-Transformer Hybrid Model Based on CSWin Transformer for UAV Image Object Detection

被引:41
作者
Lu, Wanjie [1 ]
Lan, Chaozhen [2 ]
Niu, Chaoyang [1 ]
Liu, Wei [1 ]
Lyu, Liang [2 ]
Shi, Qunshan [2 ]
Wang, Shiju [1 ]
机构
[1] PLA Strateg Support Force Informat Engn Univ, Inst Data & Target Engn, Zhengzhou 450001, Peoples R China
[2] PLA Strateg Support Force Informat Engn Univ, Inst Geospatial Informat, Zhengzhou 450001, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformers; Feature extraction; Detectors; Autonomous aerial vehicles; Computational modeling; Training; Convolutional neural network (CNN); hybrid network; object detection; transformer; unmanned aerial vehicle (UAV) image; NETWORK;
D O I
10.1109/JSTARS.2023.3234161
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The object detection of unmanned aerial vehicle (UAV) images has widespread applications in numerous fields; however, the complex background, diverse scales, and uneven distribution of objects in UAV images make object detection a challenging task. This study proposes a convolution neural network transformer hybrid model to achieve efficient object detection in UAV images, which has three advantages that contribute to improving object detection performance. First, the efficient and effective cross-shaped window (CSWin) transformer can be used as a backbone to obtain image features at different levels, and the obtained features can be input into the feature pyramid network to achieve multiscale representation, which will contribute to multiscale object detection. Second, a hybrid patch embedding module is constructed to extract and utilize low-level information such as the edges and corners of the image. Finally, a slicing-based inference method is constructed to fuse the inference results of the original image and sliced images, which will improve the small object detection accuracy without modifying the original network. Experimental results on public datasets illustrate that the proposed method can improve performance more effectively than several popular and state-of-the-art object detection methods.
引用
收藏
页码:1211 / 1231
页数:21
相关论文
共 50 条
  • [1] GhostFormer: Efficiently amalgamated CNN-transformer architecture for object detection
    Xie, Xin
    Wu, Dengquan
    Xie, Mingye
    Li, Zixi
    PATTERN RECOGNITION, 2024, 148
  • [2] Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion
    Yang Chen
    Hou Zhiqiang
    Li Xinyue
    Ma Sugang
    Yang Xiaobao
    ACTA PHOTONICA SINICA, 2024, 53 (03)
  • [3] Combining transformer and CNN for object detection in UAV imagery
    Hendria, Willy Fitra
    Phan, Quang Thinh
    Adzaka, Fikriansyah
    Jeong, Cheol
    ICT EXPRESS, 2023, 9 (02): : 258 - 263
  • [4] A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
    Huang, Youxiang
    Jiao, Donglai
    Huang, Xingru
    Tang, Tiantian
    Gui, Guan
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 241 - 254
  • [5] A hybrid CNN-Transformer model for Historical Document Image Binarization
    Rezanezhad, Vahid
    Baierer, Konstantin
    Neudecker, Clemens
    PROCEEDINGS OF THE 2023 INTERNATIONAL WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING, HIP 2023, 2023, : 79 - 84
  • [6] HCformer: Hybrid CNN-Transformer for LDCT Image Denoising
    Yuan, Jinli
    Zhou, Feng
    Guo, Zhitao
    Li, Xiaozeng
    Yu, Hengyong
    JOURNAL OF DIGITAL IMAGING, 2023, 36 (05) : 2290 - 2305
  • [7] Transformer with Transfer CNN for Remote-Sensing-Image Object Detection
    Li, Qingyun
    Chen, Yushi
    Zeng, Ying
    REMOTE SENSING, 2022, 14 (04)
  • [8] A DeNoising FPN With Transformer R-CNN for Tiny Object Detection
    Liu, Hou-, I
    Tseng, Yu-Wen
    Chang, Kai-Cheng
    Wang, Pin-Jyun
    Shuai, Hong-Han
    Cheng, Wen-Huang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [9] Hybrid CNN-Transformer Features for Visual Place Recognition
    Wang, Yuwei
    Qiu, Yuanying
    Cheng, Peitao
    Zhang, Junyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1109 - 1122
  • [10] A CNN-Transformer Hybrid Recognition Approach for sEMG-Based Dynamic Gesture Prediction
    Liu, Yanhong
    Li, Xingyu
    Yang, Lei
    Bian, Guibin
    Yu, Hongnian
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72