Remote Sensing Scene Classification via Second-Order Differentiable Token Transformer Network

被引:0
|
作者
Ni, Kang [1 ,2 ,3 ]
Wu, Qianqian [1 ]
Li, Sichan [4 ]
Zheng, Zhizhong [1 ,2 ]
Wang, Peng [3 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
[2] Jiangsu Prov Engn Res Ctr Airborne Detecting & Int, Nanjing 210049, Peoples R China
[3] Nanjing Univ Aeronaut & Astronaut, Key Lab Radar Imaging & Microwave Photon, Minist Educ, Nanjing 211106, Peoples R China
[4] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing 210023, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
基金
中国国家自然科学基金;
关键词
Transformers; Remote sensing; Image coding; Merging; Representation learning; Visualization; Vectors; Classification token; learnable token; remote sensing; scene classification; vision transformer;
D O I
10.1109/TGRS.2024.3407879
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
The vision transformer has been widely applied in remote sensing image scene classification due to its excellent ability to capture global features. However, remote sensing scene images involves challenges such as scene complexity and small interclass differences. Directly utilizing the global tokens of the transformer for feature learning may increase computational complexity. Therefore, constructing a distinguishable transformer network that adaptively selects tokens can effectively improve the classification performance of remote sensing scene images while considering computational complexity. Based on this, a second-order differentiable token transformer network (SDT2Net) is proposed for considering the efficacy of distinguishable statistical features and nonredundant learnable tokens of remote sensing scene images. A novel transformer block, including an efficient attention block (EAB) and differentiable token compression (DTC) mechanism, is inserted into SDT2Net for acquiring selectable token features of each scene image guided by sparse shift local features and token compression rate learning style. Furthermore, a fast token fusion (FTF) module is developed for acquiring more distinguishable token feature representations. This module utilizes the fast global covariance pooling algorithm to acquire high-order visual tokens and validates the effectiveness of classification tokens and high-order visual tokens for scene classification. Compared with other recent methods, SDT2Net achieves the most advanced performance with comparable floating point operations per second (FLOPs). The code will be available at https://github.com/RSIP-NJUPT/SDT2Net.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [1] First and Second-Order Information Fusion Networks for Remote Sensing Scene Classification
    Li, Erzhu
    Samat, Alim
    Zhang, Ce
    Du, Peijun
    Liu, Wei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [2] A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification
    Li, Ziwei
    Xu, Weiming
    Yang, Shiyu
    Wang, Juan
    Su, Hua
    Huang, Zhanchao
    Wu, Sheng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 20315 - 20330
  • [3] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
    Bi, Meiqiao
    Wang, Minghua
    Li, Zhi
    Hong, Danfeng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
  • [4] Masked Second-Order Pooling for Few-Shot Remote-Sensing Scene Classification
    Deng, Jianan
    Wang, Qianli
    Liu, Nanqing
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [5] Remote Sensing Scene Classification Using Spatial Transformer Fusion Network
    Tong, Shun
    Qi, Kunlun
    Guan, Qingfeng
    Zhu, Qiqi
    Yang, Chao
    Zheng, Jie
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 549 - 552
  • [6] Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification
    Huang, Xinyan
    Liu, Fang
    Cui, Yuanhao
    Chen, Puhua
    Li, Lingling
    Li, Pengfang
    REMOTE SENSING, 2023, 15 (14)
  • [7] MITformer: A Multiinstance Vision Transformer for Remote Sensing Scene Classification
    Sha, Zongyao
    Li, Jianfeng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [8] Attention Consistent Network for Remote Sensing Scene Classification
    Tang, Xu
    Ma, Qiushuo
    Zhang, Xiangrong
    Liu, Fang
    Ma, Jingjing
    Jiao, Licheng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 2030 - 2045
  • [9] Remote Sensing Scene Classification via Multi-Branch Local Attention Network
    Chen, Si-Bao
    Wei, Qing-Song
    Wang, Wen-Zhong
    Tang, Jin
    Luo, Bin
    Wang, Zu-Yuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 99 - 109
  • [10] ATMformer: An Adaptive Token Merging Vision Transformer for Remote Sensing Image Scene Classification
    Niu, Yi
    Song, Zhuochen
    Luo, Qingyu
    Chen, Guochao
    Ma, Mingming
    Li, Fu
    REMOTE SENSING, 2025, 17 (04)