HTC-Net: A hybrid CNN-transformer framework for medical image segmentation

被引:39
作者
Tang, Hui [1 ,2 ]
Chen, Yuanbin [1 ,2 ]
Wang, Tao [1 ,2 ]
Zhou, Yuanbo [1 ,2 ]
Zhao, Longxuan [1 ,2 ]
Gao, Qinquan [1 ,2 ,3 ]
Du, Min [1 ,2 ]
Tan, Tao [4 ]
Zhang, Xinlin [1 ,2 ,3 ]
Tong, Tong [1 ,2 ,3 ]
机构
[1] Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou, Peoples R China
[2] Fuzhou Univ, Fujian Key Lab Med Instrumentat & Pharmaceut Techn, Fuzhou, Peoples R China
[3] Imperial Vis Technol, Fuzhou, Peoples R China
[4] Macao Polytech Univ, Faulty Appl Sci, Macau, Macao Special A, Peoples R China
基金
中国国家自然科学基金;
关键词
Medical image segmentation; Deep convolutional neural networks; Contextual information; Attention; CONVOLUTIONAL NEURAL-NETWORKS;
D O I
10.1016/j.bspc.2023.105605
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Automated medical image segmentation is a crucial step in clinical analysis and diagnosis, as it can improve diagnostic efficiency and accuracy. Deep convolutional neural networks (DCNNs) have been widely used in the medical field, achieving excellent results. The high complexity of medical images poses a significant challenge for many networks in balancing local and global information, resulting in unstable segmentation outcomes. To address the challenge, we designed a hybrid CNN-Transformer network to capture both the local and global information. More specifically, deep convolutional neural networks are introduced to exploit the local information. At the same time, we designed a trident multi-layer fusion (TMF) block for the Transformer to fuse contextual information from higher-level (global) features dynamically. Moreover, considering the inherent characteristic of medical image segmentation (e.g., irregular shapes and discontinuous boundaries), we developed united attention (UA) blocks to focus on important feature learning. To evaluate the effectiveness of our proposed approach, we performed experiments on two publicly available datasets, ISIC-2017, and Kvasir-SEG, and compared our results with state-of-the-art approaches. The experimental results demonstrate the superior performance of our approach. The codes are available at https://github.com/Tanghui2000/HTC-Net.
引用
收藏
页数:10
相关论文
共 61 条
[1]   Attention Deeplabv3+: Multi-level Context Attention Mechanism for Skin Lesion Segmentation [J].
Azad, Reza ;
Asadi-Aghbolaghi, Maryam ;
Fathy, Mahmood ;
Escalera, Sergio .
COMPUTER VISION - ECCV 2020 WORKSHOPS, PT I, 2020, 12535 :251-266
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
[4]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[5]  
Chen J., 2021, arXiv
[6]  
Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, 10.48550/arXiv.1706.05587]
[7]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[8]   CoTrFuse: a novel framework by fusing CNN and transformer for medical image segmentation [J].
Chen, Yuanbin ;
Wang, Tao ;
Tang, Hui ;
Zhao, Longxuan ;
Zhang, Xinlin ;
Tan, Tao ;
Gao, Qinquan ;
Du, Min ;
Tong, Tong .
PHYSICS IN MEDICINE AND BIOLOGY, 2023, 68 (17)
[9]  
Cicek Ozgun, 2016, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. 19th International Conference. Proceedings: LNCS 9901, P424, DOI 10.1007/978-3-319-46723-8_49
[10]  
Codella N, 2019, Arxiv, DOI arXiv:1902.03368