Remote Sensing Scene Classification via Second-Order Differentiable Token Transformer Network

被引：0

作者：

Ni, Kang ^{[1
,2
,3
]}

Wu, Qianqian ^{[1
]}

Li, Sichan ^{[4
]}

Zheng, Zhizhong ^{[1
,2
]}

Wang, Peng ^{[3
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China

[2] Jiangsu Prov Engn Res Ctr Airborne Detecting & Int, Nanjing 210049, Peoples R China

[3] Nanjing Univ Aeronaut & Astronaut, Key Lab Radar Imaging & Microwave Photon, Minist Educ, Nanjing 211106, Peoples R China

[4] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing 210023, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Transformers; Remote sensing; Image coding; Merging; Representation learning; Visualization; Vectors; Classification token; learnable token; remote sensing; scene classification; vision transformer;

D O I：

10.1109/TGRS.2024.3407879

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

The vision transformer has been widely applied in remote sensing image scene classification due to its excellent ability to capture global features. However, remote sensing scene images involves challenges such as scene complexity and small interclass differences. Directly utilizing the global tokens of the transformer for feature learning may increase computational complexity. Therefore, constructing a distinguishable transformer network that adaptively selects tokens can effectively improve the classification performance of remote sensing scene images while considering computational complexity. Based on this, a second-order differentiable token transformer network (SDT2Net) is proposed for considering the efficacy of distinguishable statistical features and nonredundant learnable tokens of remote sensing scene images. A novel transformer block, including an efficient attention block (EAB) and differentiable token compression (DTC) mechanism, is inserted into SDT2Net for acquiring selectable token features of each scene image guided by sparse shift local features and token compression rate learning style. Furthermore, a fast token fusion (FTF) module is developed for acquiring more distinguishable token feature representations. This module utilizes the fast global covariance pooling algorithm to acquire high-order visual tokens and validates the effectiveness of classification tokens and high-order visual tokens for scene classification. Compared with other recent methods, SDT2Net achieves the most advanced performance with comparable floating point operations per second (FLOPs). The code will be available at https://github.com/RSIP-NJUPT/SDT2Net.

引用

页码：1 / 15

页数：15

共 50 条

[21] MSNet: A Multiple Supervision Network for Remote Sensing Scene Classification
Liu, Nanqing
Celik, Turgay
Li, Heng-Chao
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[22] An Explainable Spatial-Frequency Multiscale Transformer for Remote Sensing Scene Classification
Yang, Yuting
Jiao, Licheng
Liu, Fang
Liu, Xu
Li, Lingling
Chen, Puhua
Yang, Shuyuan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[23] Context Residual Attention Network for Remote Sensing Scene Classification
Wang, Yuhua
Hu, Yaxin
Xu, Yuezhu
Jiao, Peiyuan
Zhang, Xiangrong
Cui, Huanyu
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[24] Knowledge Guided Evolutionary Transformer for Remote Sensing Scene Classification
Zhao, Jiaxuan
Jiao, Licheng
Wang, Chao
Liu, Xu
Liu, Fang
Li, Lingling
Ma, Mengru
Yang, Shuyuan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 10368 - 10384
[25] Remote-Sensing Scene Classification via Multistage Self-Guided Separation Network
Wang, Junjie
Li, Wei
Zhang, Mengmeng
Tao, Ran
Chanussot, Jocelyn
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[26] Remote sensing scene classification using multi-domain sematic high-order network
Lu, Yuanyuan
Zhu, Yanhui
Feng, Hao
Liu, Yang
IMAGE AND VISION COMPUTING, 2024, 143
[27] SCViT: A Spatial-Channel Feature Preserving Vision Transformer for Remote Sensing Image Scene Classification
Lv, Pengyuan
Wu, Wenjun
Zhong, Yanfei
Du, Fang
Zhang, Liangpei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[28] Two-Stream Swin Transformer with Differentiable Sobel Operator for Remote Sensing Image Classification
Hao, Siyuan
Wu, Bin
Zhao, Kun
Ye, Yuanxin
Wang, Wei
REMOTE SENSING, 2022, 14 (06)
[29] Diverse Capsules Network Combining Multiconvolutional Layers for Remote Sensing Image Scene Classification
Raza, Asif
Huo, Hong
Sirajuddin, Salayidin
Fang, Tao
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2020, 13 (13) : 5297 - 5313
[30] Self-Attention Network With Joint Loss for Remote Sensing Image Scene Classification
Wu, Honglin
Zhao, Shuzhen
Li, Liang
Lu, Chaoquan
Chen, Wen
IEEE ACCESS, 2020, 8 : 210347 - 210359

← 1 2 3 4 5 →