Synthetic aperture radar image ship classification based on ViT-CNN hybrid network

被引：0

作者：

Shao, Ran ^{[1
,2
]}

Bi, Xiaojun ^{[3
,4
]}

机构：

[1] College of Information and Communication Engineering, Harbin Engineering University, Harbin

[2] College of Electronic and Information Engineering, Harbin Vocational & Technical College, Harbin

[3] Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing

[4] School of Information Engineering, Minzu University of China, Beijing

来源：

Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University | 2024年 / 45卷 / 08期

关键词：

convolutional neural network; deep learning; global feature; local feature; parameters sharing; ship image; synthetic aperture radar image; vision transformer;

D O I：

10.11990/jheu.202312026

中图分类号：

学科分类号：

摘要：

In recent years, vision transformer (ViT) has made significant breakthroughs in the field of image classification. However, it is difficult to adapt to the task of synthetic aperture radar image ship classification due to its lack of multiscale and local feature capture capability. For this reason, this paper proposes a hybrid network model for synthetic aperture radar image ship classification. A staged downsampling network structure is designed to solve the problem that ViT is unable to capture multi-scale features. By incorporating the convolutional structure into three core modules of the ViT model, three modules, namely, convolutional token embedding, convolutional parameters sharing attention, and local feed-forward network, are designed, which enable the network to capture both global and local features of the ship images, and further enhance the network's inductive biasing and feature extraction ability. Experimental results show that the proposed model in this paper improves the classification accuracy by 2. 96% and 4. 18% compared with the existing optimal method on two generalized SAR ship image datasets, OpenSARShip and FUSAR-Ship, respectively, which effectively improves the performance of SAR image ship classification. © 2024 Editorial Board of Journal of Harbin Engineering. All rights reserved.

引用

页码：1616 / 1623

页数：7

共 26 条

[11] MOUTIK O, SEKKAT H, TIGANI S, Et al., Convolutional neural networks or vision transformers: who will win the race for action recognitions in visual data?, Sensors, 23, 2, (2023)
[12] LYU Xinyao, XIA Hao, LI Na, Et al., MFVT: multilevel feature fusion vision transformer and RAMix data augmentation for fine-grained visual categorization, Electronics, 11, 21, (2022)
[13] WU Haiping, XIAO Bin, CODELLA N, Et al., CvT: introducing convolutions to vision transformers, 2021 IEEE / CVF International Conference on Computer Vision (ICCV), pp. 22-31, (2021)
[14] YUAN Kun, GUO Shaopeng, LIU Ziwei, Et al., Incorporating convolution designs into visual transformers, 2021 IEEE/ CVF International Conference on Computer Vision (ICCV), pp. 559-568, (2021)
[15] LI Canlin, SONG Shun, ZHANG Wenjiao, Et al., Transformer-based multi-scale gradient feature fusion for low-light image enhancement, Third International Conference on Computer Vision and Pattern Analysis (ICCPA 2023), pp. 209-214, (2023)
[16] WANG Wenhai, XIE Enze, LI Xiang, Et al., Pyramid vision transformer: a versatile backbone for dense prediction without convolutions, 2021 IEEE/ CVF International Conference on Computer Vision (ICCV), pp. 548-558, (2021)
[17] ZHU Hairui, GUO Shanhong, SHENG Weixing, Et al., SCM: a searched convolutional metaformer for SAR ship classification, Remote sensing, 15, 11, (2023)
[18] MA Jingchen, HE Ni, YOON J H, Et al., Distinguishing benign and malignant lesions on contrast-enhanced breast cone-beam CT with deep learning neural architecture search, European journal of radiology, 142, (2021)
[19] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, Et al., An image is worth 16 × 16 words: transformers for image recognition at scale [ J ], (2020)
[20] GUAN Chenzhi, Masked visual transformer for efficient training with small dataset [ J], International journal of pattern recognition and artificial intelligence, 37, 5, pp. 235-246, (2023)

← 1 2 3 →