TriAxial Low-Rank Transformer for Efficient Medical Image Segmentation

被引：0

作者：

Shang, Jiang ^{[1
]}

Fang, Xi ^{[2
,3
]}

机构：

[1] City Univ Hong Kong, Dept Informat Syst, Kowloon Tong, Hong Kong, Peoples R China

[2] Rensselaer Polytech Inst, Dept Biomed Engn, Troy, NY 12180 USA

[3] Rensselaer Polytech Inst, Ctr Biotechnol & Interdisciplinary Studies, Troy, NY 12180 USA

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II | 2024年 / 14426卷

关键词：

Low-rank Representation; Efficient Self-attention; Vision Transformer; TriLoRa Attention; 3D Medical Image Segmentation;

D O I：

10.1007/978-981-99-8432-9_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer-CNN architectures have achieved state-of-the-art on 3D medical image segmentation due to their ability to capture both long-term dependencies and local information. However, directly using the existing transformers as encoders can be inefficient, particularly when dealing with high-resolution 3D medical images. This is due to the fact that self-attention computes pixel-to-pixel relationships, which is computationally expensive. Despite attempts to mitigate this through the use of local-window attention or axial-wise attention, these methods may result in the loss of interaction between certain local regions during the self-attention computation. Instead of using the sparsified attention, we aim to incorporate the relationships between all pixels while substantially reducing the computational demand. Inspired by the low-rank property of attention, we hypothesized that the pixel-to-pixel relationship can be approximated by the composition of the plane-to-plane relationship. We propose TriAxial Low-Rank Transformer Network (TALoRT-Net) for medical image segmentation. The core of this model lies in its attention module, which approximates pixel-to-pixel attention matrix using the low-rank representation of the product of plane-to-plane matrices and significantly reduces the computation complexity inherent in 3D self-attention. Moreover, we replaced the linear projection and vanilla Multi-Layer Perceptron (MLP) in Vision Transformer with a convolutional stem and depthwise convolution layer (DCL) to further reduce the number of model parameters. We evaluated the performance of the method on the public BTCV dataset, which significantly reduce the computational effort while maintaining uncompromised accuracy.

引用

页码：91 / 102

页数：12

共 50 条

[21] Distributed Low-rank Subspace Segmentation
Talwalkar, Ameet
Mackey, Lester
Mu, Yadong
Chang, Shih-Fu
Jordan, Michael I.
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3543 - 3550
[22] Fast Low-Rank Subspace Segmentation
Zhang, Xin
Sun, Fuchun
Liu, Guangcan
Ma, Yi
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (05) : 1293 - 1297
[23] TSE DeepLab: An efficient visual transformer for medical image segmentation
Yang, Jingdong
Tu, Jun
Zhang, Xiaolin
Yu, Shaoqing
Zheng, Xianyou
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80
[24] Sparse and Low-Rank Coupling Image Segmentation Model Via Nonconvex Regularization
Zhang, Xiujun
Xu, Chen
Li, Min
Sun, Xiaoli
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (02)
[25] Multi-Phase Image Segmentation Based on Low-Rank Prior Decomposition
Xu, Jianlou
Guo, Yuying
Huo, Leigang
IEEE ACCESS, 2022, 10 : 117439 - 117448
[26] SOLD: Sub-Optimal Low-rank Decomposition for Efficient Video Segmentation
Li, Chenglong
Lin, Liang
Zuo, Wangmeng
Yan, Shuicheng
Tang, Jin
2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 5519 - 5527
[27] Low-Rank Transfer Human Motion Segmentation
Wang, Lichen
Ding, Zhengming
Fu, Yun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 1023 - 1034
[28] A Low-rank Approach to Image Defringing
Prunet, Simon
PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 2021, 133 (1029)
[29] LIGHTWEIGHT AND EFFICIENT END-TO-END SPEECH RECOGNITION USING LOW-RANK TRANSFORMER
Winata, Genta Indra
Cahyawijaya, Samuel
Lin, Zhaojiang
Liu, Zihan
Fung, Pascale
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6144 - 6148
[30] Multimodal Medical Image Fusion Based on Multiple Latent Low-Rank Representation
Lou, Xi-Cheng
Feng, Xin
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021

← 1 2 3 4 5 →