A CNN-Transformer Hybrid Model Based on CSWin Transformer for UAV Image Object Detection

被引：41

作者：

Lu, Wanjie ^{[1
]}

Lan, Chaozhen ^{[2
]}

Niu, Chaoyang ^{[1
]}

Liu, Wei ^{[1
]}

Lyu, Liang ^{[2
]}

Shi, Qunshan ^{[2
]}

Wang, Shiju ^{[1
]}

机构：

[1] PLA Strateg Support Force Informat Engn Univ, Inst Data & Target Engn, Zhengzhou 450001, Peoples R China

[2] PLA Strateg Support Force Informat Engn Univ, Inst Geospatial Informat, Zhengzhou 450001, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2023年 / 16卷

基金：

中国国家自然科学基金;

关键词：

Object detection; Transformers; Feature extraction; Detectors; Autonomous aerial vehicles; Computational modeling; Training; Convolutional neural network (CNN); hybrid network; object detection; transformer; unmanned aerial vehicle (UAV) image; NETWORK;

D O I：

10.1109/JSTARS.2023.3234161

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The object detection of unmanned aerial vehicle (UAV) images has widespread applications in numerous fields; however, the complex background, diverse scales, and uneven distribution of objects in UAV images make object detection a challenging task. This study proposes a convolution neural network transformer hybrid model to achieve efficient object detection in UAV images, which has three advantages that contribute to improving object detection performance. First, the efficient and effective cross-shaped window (CSWin) transformer can be used as a backbone to obtain image features at different levels, and the obtained features can be input into the feature pyramid network to achieve multiscale representation, which will contribute to multiscale object detection. Second, a hybrid patch embedding module is constructed to extract and utilize low-level information such as the edges and corners of the image. Finally, a slicing-based inference method is constructed to fuse the inference results of the original image and sliced images, which will improve the small object detection accuracy without modifying the original network. Experimental results on public datasets illustrate that the proposed method can improve performance more effectively than several popular and state-of-the-art object detection methods.

引用

页码：1211 / 1231

页数：21

共 50 条

[41] ETFormer: An Efficient Transformer Based on Multimodal Hybrid Fusion and Representation Learning for RGB-D-T Salient Object Detection
Qiu, Jiyuan
Jiang, Chen
Wang, Haowen
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2930 - 2934
[42] A GAN Guided NCCT to CECT Synthesis With an Advanced CNN-Transformer Aggregated Generator
Wang, Haozhe
Gong, Dawei
Zhou, Rongzhen
Liang, Junbo
Zhang, Ruili
Ji, Wenbin
He, Sailing
IEEE ACCESS, 2025, 13 : 72202 - 72220
[43] ConvTransNet: A CNN-Transformer Network for Change Detection With Multiscale Global-Local Representations
Li, Weiming
Xue, Lihui
Wang, Xueqian
Li, Gang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[44] Siamese Transformer Network for Hyperspectral Image Target Detection
Rao, Weiqiang
Gao, Lianru
Qu, Ying
Sun, Xu
Zhang, Bing
Chanussot, Jocelyn
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[45] A Novel Keypoint Supplemented R-CNN for UAV Object Detection
Butler, Justin
Leung, Henry
IEEE SENSORS JOURNAL, 2023, 23 (24) : 30883 - 30892
[46] DACTransNet: A Hybrid CNN-Transformer Network for Histopathological Image Classification of Pancreatic Cancer
Kou, Yongqing
Xia, Cong
Jiao, Yiping
Zhang, Daoqiang
Ge, Rongjun
ARTIFICIAL INTELLIGENCE, CICAI 2023, PT II, 2024, 14474 : 422 - 434
[47] A survey: object detection methods from CNN to transformer
Ershat Arkin
Nurbiya Yadikar
Xuebin Xu
Alimjan Aysa
Kurban Ubul
Multimedia Tools and Applications, 2023, 82 : 21353 - 21383
[48] A survey: object detection methods from CNN to transformer
Arkin, Ershat
Yadikar, Nurbiya
Xu, Xuebin
Aysa, Alimjan
Ubul, Kurban
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (14) : 21353 - 21383
[49] A CNN-TRANSFORMER HYBRID FEATURE DESCRIPTOR FOR OPTICAL-SAR IMAGE REGISTRATION
Lin, Mingxin
Liu, Binyuan
Liu, Yijun
Wang, Qingsong
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6069 - 6072
[50] Hybrid CNN-transformer network for efficient CSI feedback
Zhao, Ruohan
Liu, Ziang
Song, Tianyu
Jin, Jiyu
Jin, Guiyue
Fan, Lei
PHYSICAL COMMUNICATION, 2024, 66

← 1 2 3 4 5 →