LATrans-Unet: Improving CNN-Transformer with Location Adaptive for Medical Image Segmentation

被引：0

作者：

Lin, Qiqin ^{[1
]}

Yao, Junfeng ^{[1
,2
,3
]}

Hong, Qingqi ^{[1
,3
,4
]}

Cao, Xianpeng ^{[1
]}

Zhou, Rongzhou ^{[1
]}

Xie, Weixing ^{[1
]}

机构：

[1] Xiamen Univ, Sch Film, Sch Informat, Ctr Digital Media Comp, Xiamen 361005, Peoples R China

[2] Minist Culture & Tourism, Key Lab Digital Protect & Intelligent Proc Intang, Xiamen, Peoples R China

[3] Xiamen Univ, Inst Artificial Intelligence, Xiamen 361005, Peoples R China

[4] Hong Kong Ctr Cerebrocardiovasc Hlth Engn COCHE, Hong Kong, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XIII | 2024年 / 14437卷

关键词：

Medical image segmentation; Transformer; Location information; Skip connection; NET;

D O I：

10.1007/978-981-99-8558-6_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have been widely employed in medical image segmentation. While CNNs excel in local feature encoding, their ability to capture long-range dependencies is limited. In contrast, ViTs have strong global modeling capabilities. However, existing attention-based ViT models face difficulties in adaptively preserving accurate location information, rendering them unable to handle variations in important information within medical images. To inherit the merits of CNN and ViT while avoiding their respective limitations, we propose a novel framework called LATrans-Unet. By comprehensively enhancing the representation of information in both shallow and deep levels, LATrans-Unet maximizes the integration of location information and contextual details. In the shallow levels, based on a skip connection called SimAM-skip, we emphasize information boundaries and bridge the encoder-decoder semantic gap. Additionally, to capture organ shape and location variations in medical images, we propose Location-Adaptive Attention in the deep levels. It enables accurate segmentation by guiding the model to track changes globally and adaptively. Extensive experiments on multi-organ and cardiac segmentation tasks validate the superior performance of LATrans-Unet compared to previous state-of-the-art methods. The codes and trained models will be available soon.

引用

页码：223 / 234

页数：12

共 26 条

[1]

Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9

[2]

Chen J., 2021, arXiv

[3]

Chen LC, 2017, Arxiv, DOI arXiv:1706.05587

[4] Attentional Feature Fusion [J].

Dai, Yimian ;

Gieseke, Fabian ;

Oehmcke, Stefan ;

Wu, Yiquan ;

Barnard, Kobus .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :3559-3568

[5]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[6]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[7] CE-Net: Context Encoder Network for 2D Medical Image Segmentation [J].

Gu, Zaiwang ;

Cheng, Jun ;

Fu, Huazhu ;

Zhou, Kang ;

Hao, Huaying ;

Zhao, Yitian ;

Zhang, Tianyang ;

Gao, Shenghua ;

Liu, Jiang .

IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (10) :2281-2292

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9] Coordinate Attention for Efficient Mobile Network Design [J].

Hou, Qibin ;

Zhou, Daquan ;

Feng, Jiashi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13708-13717

[10]

Huang HM, 2020, INT CONF ACOUST SPEE, P1055, DOI [10.1109/ICASSP40776.2020.9053405, 10.1109/icassp40776.2020.9053405]

← 1 2 3 →