CoT: Contourlet Transformer for Hierarchical Semantic Segmentation

被引：0

作者：

Shao, Yilin ^{[1
]}

Sun, Long ^{[1
]}

Jiao, Licheng ^{[1
]}

Liu, Xu ^{[1
]}

Liu, Fang ^{[1
]}

Li, Lingling ^{[1
]}

Yang, Shuyuan ^{[1
]}

机构：

[1] Xidian Univ, Sch Artificial Intelligence, Int Res Ctr Intelligent Percept & Computat, Minist Educ China,Key Lab Intelligent Percept & I, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Transformers; Semantics; Semantic segmentation; Task analysis; Computed tomography; Convolutional neural networks; Contourlet transform (CT); semantic segmentation; sparse convolution; Transformer-convolutional neural network (CNN) hybrid model;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory- and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context [57.21% mean intersection over union (mIoU)], ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.

引用

页码：132 / 146

页数：15

共 50 条

[21] DatUS: Data-Driven Unsupervised Semantic Segmentation With Pretrained Self-Supervised Vision Transformer
Kumar, Sonal
Sur, Arijit
Baruah, Rashmi Dutta
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (05) : 1775 - 1788
[22] A Patch Diversity Transformer for Domain Generalized Semantic Segmentation
He, Pei
Jiao, Licheng
Shang, Ronghua
Liu, Xu
Liu, Fang
Yang, Shuyuan
Zhang, Xiangrong
Wang, Shuang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14138 - 14150
[23] Class-Guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery
Meng, Xiaoliang
Yang, Yuechi
Wang, Libo
Wang, Teng
Li, Rui
Zhang, Ce
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[24] MUSTER: A Multi-Scale Transformer-Based Decoder for Semantic Segmentation
Xu, Jing
Shi, Wentao
Gao, Pan
Li, Qizhu
Wang, Zhengwei
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025, 9 (01): : 202 - 212
[25] Attentive Boundary-Aware Fusion for Defect Semantic Segmentation Using Transformer
Yeung, Ching-Chi
Lam, Kin-Man
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
[26] Hierarchical Point Cloud Transformer: A Unified Vegetation Semantic Segmentation Model for Multisource Point Clouds Based on Deep Learning
Qiang, Xiaoyong
He, Weibing
Chen, Shengyi
Lv, Qingzhe
Huang, Fang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 16
[27] Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery
Zhang, Cheng
Jiang, Wanshou
Zhang, Yuan
Wang, Wei
Zhao, Qing
Wang, Chenjie
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[28] Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery
Zhou, Xuanyu
Zhou, Lifan
Gong, Shengrong
Zhong, Shan
Yan, Wei
Huang, Yizhou
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 175 - 189
[29] EMSFomer: Efficient Multi-Scale Transformer for Real-Time Semantic Segmentation
Xia, Zhengyu
Kim, Joohee
IEEE ACCESS, 2025, 13 : 18239 - 18252
[30] MCTformer plus : Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
Xu, Lian
Bennamoun, Mohammed
Boussaid, Farid
Laga, Hamid
Ouyang, Wanli
Xu, Dan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8380 - 8395

← 1 2 3 4 5 →