CoT: Contourlet Transformer for Hierarchical Semantic Segmentation

被引：0

作者：

Shao, Yilin ^{[1
]}

Sun, Long ^{[1
]}

Jiao, Licheng ^{[1
]}

Liu, Xu ^{[1
]}

Liu, Fang ^{[1
]}

Li, Lingling ^{[1
]}

Yang, Shuyuan ^{[1
]}

机构：

[1] Xidian Univ, Sch Artificial Intelligence, Int Res Ctr Intelligent Percept & Computat, Minist Educ China,Key Lab Intelligent Percept & I, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Transformers; Semantics; Semantic segmentation; Task analysis; Computed tomography; Convolutional neural networks; Contourlet transform (CT); semantic segmentation; sparse convolution; Transformer-convolutional neural network (CNN) hybrid model;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory- and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context [57.21% mean intersection over union (mIoU)], ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.

引用

页码：132 / 146

页数：15

共 50 条

[41] STN: Saliency-Guided Transformer Network for Point-Wise Semantic Segmentation of Urban Scenes
Ma, Lingfei
Li, Jonathan
Guan, Haiyan
Yu, Yongtao
Chen, Yiping
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[42] CMLFormer: CNN and Multiscale Local-Context Transformer Network for Remote Sensing Images Semantic Segmentation
Wu, Honglin
Zhang, Min
Huang, Peng
Tang, Wenlong
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 7233 - 7241
[43] ELiFormer: A hierarchical Transformer based Model with Efficient Encoder and Lightweight Decoder for Semantic Segmentation
Wu, Zixuan
Zhou, Yue
2024 2ND ASIA CONFERENCE ON COMPUTER VISION, IMAGE PROCESSING AND PATTERN RECOGNITION, CVIPPR 2024, 2024,
[44] FTransDeepLab: Multimodal Fusion Transformer-Based DeepLabv3+for Remote Sensing Semantic Segmentation
Feng, Haixia
Hu, Qingwu
Zhao, Pengcheng
Wang, Shunli
Ai, Mingyao
Zheng, Daoyuan
Liu, Tiancheng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[45] A Hierarchical Loss for Semantic Segmentation
Muller, Bruce
Smith, William
VISAPP: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 4: VISAPP, 2020, : 260 - 267
[46] A Novel Semantic Segmentation Algorithm Using a Hierarchical Adjacency Dependent Network
Li, Jianjun
Yu, Jie
Yang, Dan
Tian, Wanyong
Zhao, Lulu
Hu, Junfeng
IEEE ACCESS, 2019, 7 : 150444 - 150452
[47] A reversible transformer for LiDAR point cloud semantic segmentation
Akwensi, Perpertual Hope
Wang, Ruisheng
2023 20TH CONFERENCE ON ROBOTS AND VISION, CRV, 2023, : 19 - 28
[48] TransDeep: Transformer-Integrated DeepLabV3+for Image Semantic Segmentation
Chai, Tengfei
Xiao, Zhiguo
Shen, Xiangfeng
Liu, Qian
Li, Nianfeng
Guan, Tong
Tian, Jia
IEEE ACCESS, 2025, 13 : 6277 - 6291
[49] AAFormer: Attention-Attended Transformer for Semantic Segmentation of Remote Sensing Images
Li, Xin
Xu, Feng
Li, Linyang
Xu, Nan
Liu, Fan
Yuan, Chi
Chen, Ziqi
Lyu, Xin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[50] Polarized Attention Weak Supervised Semantic Segmentation Network
Dai, Min
Wu, Donghang
Dawei, Yang
IEEE ACCESS, 2024, 12 : 53965 - 53973

← 1 2 3 4 5 →