Hierarchical Decoder with Parallel Transformer and CNN for Medical Image Segmentation

被引：0

作者：

Li, Shijie ^{[1
]}

Gong, Yu ^{[1
]}

Xiang, Qingyuan ^{[1
]}

Li, Zheng ^{[1
,2
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Sichuan Univ, Tianfu Engn Oriented Numercial Simulat & Software, Chengdu 610207, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XIV | 2025年 / 15044卷

基金：

中国国家自然科学基金;

关键词：

Medical image segmentation; Hierarchical decoder; Attention mechanism; PLUS PLUS;

D O I：

10.1007/978-981-97-8496-7_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the success of Transformers, hybrid Transformer and CNN methods gain considerable popularity in medical image segmentation. These methods utilize a hybrid architecture that combines Transformers and CNNs to fuse global and local information, supplemented by a pyramid structure to facilitate multi-scale interaction. However, they encounter two primary limitations: (i) Transformer struggle to capture complete global information due to the sliding window nature of the convolutional operator, and (ii) the pyramid structure within single decoder fails to provide sufficient multi-scale interaction necessary for restoring detailed features at higher levels. In this paper, we introduce the Hierarchical Decoder with Parallel Transformer and CNN (HiPar), a novel architecture designed to address these limitations. Firstly, we present a parallel structure of Transformer and CNN to maximize the capture of both global and local features. Subsequently, we propose a hierarchical decoder to model multi-scale information and progressively restore spatial details. Additionally, we incorporate lightweight components to enhance the efficiency of feature representation. Extensive experiments demonstrate that our HiPar achieves state-of-the-art results on three popular medical image segmentation benchmarks: Synapse, ACDC and GlaS.

引用

页码：133 / 147

页数：15

共 34 条

[1] Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? [J].

Bernard, Olivier ;

Lalande, Alain ;

Zotti, Clement ;

Cervenansky, Frederick ;

Yang, Xin ;

Heng, Pheng-Ann ;

Cetin, Irem ;

Lekadir, Karim ;

Camara, Oscar ;

Gonzalez Ballester, Miguel Angel ;

Sanroma, Gerard ;

Napel, Sandy ;

Petersen, Steffen ;

Tziritas, Georgios ;

Grinias, Elias ;

Khened, Mahendra ;

Kollerathu, Varghese Alex ;

Krishnamurthi, Ganapathy ;

Rohe, Marc-Michel ;

Pennec, Xavier ;

Sermesant, Maxime ;

Isensee, Fabian ;

Jaeger, Paul ;

Maier-Hein, Klaus H. ;

Full, Peter M. ;

Wolf, Ivo ;

Engelhardt, Sandy ;

Baumgartner, Christian F. ;

Koch, Lisa M. ;

Wolterink, Jelmer M. ;

Isgum, Ivana ;

Jang, Yeonggul ;

Hong, Yoonmi ;

Patravali, Jay ;

Jain, Shubham ;

Humbert, Olivier ;

Jodoin, Pierre-Marc .

IEEE TRANSACTIONS ON MEDICAL IMAGING, 2018, 37 (11) :2514-2525

[2] A non-local algorithm for image denoising [J].

Buades, A ;

Coll, B ;

Morel, JM .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2005, :60-65

[3]

Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9

[4]

Cao Y, 2019, IEEE ICC

[5]

Chen J., 2021, PREPRINT

[6] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].

Chen, Long ;

Zhang, Hanwang ;

Xiao, Jun ;

Nie, Liqiang ;

Shao, Jian ;

Liu, Wei ;

Chua, Tat-Seng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306

[7]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[8]

Fang XY, 2023, PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, P736

[9]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]

[10]

Huang HM, 2020, INT CONF ACOUST SPEE, P1055, DOI [10.1109/ICASSP40776.2020.9053405, 10.1109/icassp40776.2020.9053405]

← 1 2 3 4 →