CoT: Contourlet Transformer for Hierarchical Semantic Segmentation

被引:0
|
作者
Shao, Yilin [1 ]
Sun, Long [1 ]
Jiao, Licheng [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
Li, Lingling [1 ]
Yang, Shuyuan [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Int Res Ctr Intelligent Percept & Computat, Minist Educ China,Key Lab Intelligent Percept & I, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Semantics; Semantic segmentation; Task analysis; Computed tomography; Convolutional neural networks; Contourlet transform (CT); semantic segmentation; sparse convolution; Transformer-convolutional neural network (CNN) hybrid model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory- and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context [57.21% mean intersection over union (mIoU)], ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.
引用
收藏
页码:132 / 146
页数:15
相关论文
共 50 条
  • [21] DatUS: Data-Driven Unsupervised Semantic Segmentation With Pretrained Self-Supervised Vision Transformer
    Kumar, Sonal
    Sur, Arijit
    Baruah, Rashmi Dutta
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (05) : 1775 - 1788
  • [22] A Patch Diversity Transformer for Domain Generalized Semantic Segmentation
    He, Pei
    Jiao, Licheng
    Shang, Ronghua
    Liu, Xu
    Liu, Fang
    Yang, Shuyuan
    Zhang, Xiangrong
    Wang, Shuang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14138 - 14150
  • [23] Class-Guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery
    Meng, Xiaoliang
    Yang, Yuechi
    Wang, Libo
    Wang, Teng
    Li, Rui
    Zhang, Ce
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [24] MUSTER: A Multi-Scale Transformer-Based Decoder for Semantic Segmentation
    Xu, Jing
    Shi, Wentao
    Gao, Pan
    Li, Qizhu
    Wang, Zhengwei
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025, 9 (01): : 202 - 212
  • [25] Attentive Boundary-Aware Fusion for Defect Semantic Segmentation Using Transformer
    Yeung, Ching-Chi
    Lam, Kin-Man
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [26] Hierarchical Point Cloud Transformer: A Unified Vegetation Semantic Segmentation Model for Multisource Point Clouds Based on Deep Learning
    Qiang, Xiaoyong
    He, Weibing
    Chen, Shengyi
    Lv, Qingzhe
    Huang, Fang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 16
  • [27] Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery
    Zhang, Cheng
    Jiang, Wanshou
    Zhang, Yuan
    Wang, Wei
    Zhao, Qing
    Wang, Chenjie
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [28] Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery
    Zhou, Xuanyu
    Zhou, Lifan
    Gong, Shengrong
    Zhong, Shan
    Yan, Wei
    Huang, Yizhou
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 175 - 189
  • [29] EMSFomer: Efficient Multi-Scale Transformer for Real-Time Semantic Segmentation
    Xia, Zhengyu
    Kim, Joohee
    IEEE ACCESS, 2025, 13 : 18239 - 18252
  • [30] MCTformer plus : Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
    Xu, Lian
    Bennamoun, Mohammed
    Boussaid, Farid
    Laga, Hamid
    Ouyang, Wanli
    Xu, Dan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8380 - 8395