CoT: Contourlet Transformer for Hierarchical Semantic Segmentation

被引:0
|
作者
Shao, Yilin [1 ]
Sun, Long [1 ]
Jiao, Licheng [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
Li, Lingling [1 ]
Yang, Shuyuan [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Int Res Ctr Intelligent Percept & Computat, Minist Educ China,Key Lab Intelligent Percept & I, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Semantics; Semantic segmentation; Task analysis; Computed tomography; Convolutional neural networks; Contourlet transform (CT); semantic segmentation; sparse convolution; Transformer-convolutional neural network (CNN) hybrid model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory- and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context [57.21% mean intersection over union (mIoU)], ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.
引用
收藏
页码:132 / 146
页数:15
相关论文
共 50 条
  • [31] ViT-SAPS: Detail-Aware Transformer for Mechanical Assembly Semantic Segmentation
    Dong, Haitao
    Chen, Chengjun
    Wang, Jinlei
    Shen, Feixiang
    Pang, Yong
    IEEE ACCESS, 2023, 11 : 41467 - 41479
  • [32] Multispectral Fusion Transformer Network for RGB-Thermal Urban Scene Semantic Segmentation
    Zhou, Heng
    Tian, Chunna
    Zhang, Zhenxi
    Huo, Qizheng
    Xie, Yongqiang
    Li, Zhongbo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [33] Hierarchical Weakly Supervised Learning for Residential Area Semantic Segmentation in Remote Sensing Images
    Zhang, Libao
    Ma, Jie
    Lv, Xiruan
    Chen, Donghui
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (01) : 117 - 121
  • [34] STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation
    Gao, Liang
    Liu, Hui
    Yang, Minhang
    Chen, Long
    Wan, Yaling
    Xiao, Zhengqing
    Qian, Yurong
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 (14) : 10990 - 11003
  • [35] A Semantic Segmentation Method for Remote Sensing Images Based on the Swin Transformer Fusion Gabor Filter
    Feng, Dongdong
    Zhang, Zhihua
    Yan, Kun
    IEEE ACCESS, 2022, 10 : 77432 - 77451
  • [36] TrSeg: Transformer for semantic segmentation
    Jin, Youngsaeng
    Han, David
    Ko, Hanseok
    PATTERN RECOGNITION LETTERS, 2021, 148 : 29 - 35
  • [37] A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images
    Wang, Libo
    Li, Rui
    Duan, Chenxi
    Zhang, Ce
    Meng, Xiaoliang
    Fang, Shenghui
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [38] UM2Former: U-Shaped Multimixed Transformer Network for Large-Scale Hyperspectral Image Semantic Segmentation
    Xu, Aijun
    Xue, Zhaohui
    Li, Ziyu
    Cheng, Shun
    Su, Hongjun
    Xia, Junshi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [39] Deformable Transformer and Spectral U-Net for Large-Scale Hyperspectral Image Semantic Segmentation
    Zhang, Tianjian
    Xue, Zhaohui
    Su, Hongjun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 20227 - 20244
  • [40] A Self-Supervised Transformer With Feature Fusion for SAR Image Semantic Segmentation in Marine Aquaculture Monitoring
    Fan, Jianchao
    Zhou, Jianlin
    Wang, Xinzhe
    Wang, Jun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61