MPT-SFANet: Multiorder Pooling Transformer-Based Semantic Feature Aggregation Network for SAR Image Classification

被引：0

作者：

Ni, Kang ^{[1
,2
,3
]}

Yuan, Chunyang ^{[4
]}

Zheng, Zhizhong ^{[3
,5
]}

Zhang, Bingbing ^{[6
]}

Wang, Peng ^{[7
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Peoples R China

[2] Minist Educ, Nanjing, Peoples R China

[3] Jiangsu Prov Engn Res Ctr Airborne Detecting & In, Nanjing, Peoples R China

[4] Nanjing Univ Posts & Telecommun, Comp Sci & Technol, Nanjing, Peoples R China

[5] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China

[6] Dalian Minzu Univ, Sch Comp & Engn, Dalian, Peoples R China

[7] Nanjing Univ Aeronaut & Astronaut, Minist Educ, Nanjing, Peoples R China

来源：

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS | 2024年 / 60卷 / 04期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Transformers; Radar polarimetry; Synthetic aperture radar; Semantics; Feature extraction; Telecommunications; Land surface; Feature learning; image classification; semantic feature; synthetic aperture radar (SAR); transformer-based method; COVARIANCE;

D O I：

10.1109/TAES.2024.3382622

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

The transformer-based methods have demonstrated remarkable advancements in synthetic aperture radar (SAR) classification. Nevertheless, many of these methods ignore global statistical information and semantic feature interaction for effectively characterizing different SAR land covers under complex structures. Leveraging second-order statistics presents an efficacious approach to well characterize the statistical features of SAR patches. Motivated by this, we integrate pyramid pooling and global covariance pooling techniques into each of the multihead self-attention blocks, thereby facilitating the extraction of powerful contextual features and the global statistical nature of SAR patches, namely multiorder pooling transformer module. Simultaneously, a semantic feature aggregation module is utilized for capturing local deep features and modeling the interaction of feature information across various feature levels. Both of these modules are embedded into a U-shaped architecture, which we refer to as a multiorder pooling transformer-based semantic feature aggregation network (MPT-SFANet). Extensive experimental results on TerraSAR, Sentinel-1B, and GF-3 SAR image classification datasets indicate that MPT-SFANet exceeds several relevant methods.

引用

页码：4923 / 4938

页数：16

共 59 条

[41] Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations [J].

Sudre, Carole H. ;

Li, Wenqi ;

Vercauteren, Tom ;

Ourselin, Sebastien ;

Cardoso, M. Jorge .

DEEP LEARNING IN MEDICAL IMAGE ANALYSIS AND MULTIMODAL LEARNING FOR CLINICAL DECISION SUPPORT, 2017, 10553 :240-248

[42]

Touvron H, 2021, PR MACH LEARN RES, V139, P7358

[43] Global in Local: A Convolutional Transformer for SAR ATR FSL [J].

Wang, Chenwei ;

Huang, Yulin ;

Liu, Xiaoyu ;

Pei, Jifang ;

Zhang, Yin ;

Yang, Jianyu .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19

[44] Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization [J].

Wang, Qilong ;

Xie, Jiangtao ;

Zuo, Wangmeng ;

Zhang, Lei ;

Li, Peihua .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (08) :2582-2597

[45] PVT v2: Improved baselines with Pyramid Vision Transformer [J].

Wang, Wenhai ;

Xie, Enze ;

Li, Xiang ;

Fan, Deng-Ping ;

Song, Kaitao ;

Liang, Ding ;

Lu, Tong ;

Luo, Ping ;

Shao, Ling .

COMPUTATIONAL VISUAL MEDIA, 2022, 8 (03) :415-424

[46] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions [J].

Wang, Wenhai ;

Xie, Enze ;

Li, Xiang ;

Fan, Deng-Ping ;

Song, Kaitao ;

Liang, Ding ;

Lu, Tong ;

Luo, Ping ;

Shao, Ling .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :548-558

[47] WAFormer: Ship Detection in SAR Images Based on Window-Aware Swin-Transformer [J].

Wang, Zhicheng ;

Wang, Lingfeng ;

Wang, Wuqi ;

Tian, Shanshan ;

Zhang, Zhiwei .

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2022, PT III, 2022, 13536 :524-536

[48] AIR-PolSAR-Seg: A Large-Scale Data Set for Terrain Segmentation in Complex-Scene PolSAR Images [J].

Wang, Zhirui ;

Zeng, Xuan ;

Yan, Zhiyuan ;

Kang, Jian ;

Sun, Xian .

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 :3830-3841

[49] CvT: Introducing Convolutions to Vision Transformers [J].

Wu, Haiping ;

Xiao, Bin ;

Codella, Noel ;

Liu, Mengchen ;

Dai, Xiyang ;

Yuan, Lu ;

Zhang, Lei .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :22-31

[50] P2T: Pyramid Pooling Transformer for Scene Understanding [J].

Wu, Yu-Huan ;

Liu, Yun ;

Zhan, Xin ;

Cheng, Ming-Ming .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) :12760-12771

← 1 2 3 4 5 6 →