TriAxial Low-Rank Transformer for Efficient Medical Image Segmentation

被引:0
|
作者
Shang, Jiang [1 ]
Fang, Xi [2 ,3 ]
机构
[1] City Univ Hong Kong, Dept Informat Syst, Kowloon Tong, Hong Kong, Peoples R China
[2] Rensselaer Polytech Inst, Dept Biomed Engn, Troy, NY 12180 USA
[3] Rensselaer Polytech Inst, Ctr Biotechnol & Interdisciplinary Studies, Troy, NY 12180 USA
关键词
Low-rank Representation; Efficient Self-attention; Vision Transformer; TriLoRa Attention; 3D Medical Image Segmentation;
D O I
10.1007/978-981-99-8432-9_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-CNN architectures have achieved state-of-the-art on 3D medical image segmentation due to their ability to capture both long-term dependencies and local information. However, directly using the existing transformers as encoders can be inefficient, particularly when dealing with high-resolution 3D medical images. This is due to the fact that self-attention computes pixel-to-pixel relationships, which is computationally expensive. Despite attempts to mitigate this through the use of local-window attention or axial-wise attention, these methods may result in the loss of interaction between certain local regions during the self-attention computation. Instead of using the sparsified attention, we aim to incorporate the relationships between all pixels while substantially reducing the computational demand. Inspired by the low-rank property of attention, we hypothesized that the pixel-to-pixel relationship can be approximated by the composition of the plane-to-plane relationship. We propose TriAxial Low-Rank Transformer Network (TALoRT-Net) for medical image segmentation. The core of this model lies in its attention module, which approximates pixel-to-pixel attention matrix using the low-rank representation of the product of plane-to-plane matrices and significantly reduces the computation complexity inherent in 3D self-attention. Moreover, we replaced the linear projection and vanilla Multi-Layer Perceptron (MLP) in Vision Transformer with a convolutional stem and depthwise convolution layer (DCL) to further reduce the number of model parameters. We evaluated the performance of the method on the public BTCV dataset, which significantly reduce the computational effort while maintaining uncompromised accuracy.
引用
收藏
页码:91 / 102
页数:12
相关论文
共 50 条
  • [21] Distributed Low-rank Subspace Segmentation
    Talwalkar, Ameet
    Mackey, Lester
    Mu, Yadong
    Chang, Shih-Fu
    Jordan, Michael I.
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3543 - 3550
  • [22] Fast Low-Rank Subspace Segmentation
    Zhang, Xin
    Sun, Fuchun
    Liu, Guangcan
    Ma, Yi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (05) : 1293 - 1297
  • [23] TSE DeepLab: An efficient visual transformer for medical image segmentation
    Yang, Jingdong
    Tu, Jun
    Zhang, Xiaolin
    Yu, Shaoqing
    Zheng, Xianyou
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80
  • [24] Sparse and Low-Rank Coupling Image Segmentation Model Via Nonconvex Regularization
    Zhang, Xiujun
    Xu, Chen
    Li, Min
    Sun, Xiaoli
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (02)
  • [25] Multi-Phase Image Segmentation Based on Low-Rank Prior Decomposition
    Xu, Jianlou
    Guo, Yuying
    Huo, Leigang
    IEEE ACCESS, 2022, 10 : 117439 - 117448
  • [26] SOLD: Sub-Optimal Low-rank Decomposition for Efficient Video Segmentation
    Li, Chenglong
    Lin, Liang
    Zuo, Wangmeng
    Yan, Shuicheng
    Tang, Jin
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 5519 - 5527
  • [27] Low-Rank Transfer Human Motion Segmentation
    Wang, Lichen
    Ding, Zhengming
    Fu, Yun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 1023 - 1034
  • [28] A Low-rank Approach to Image Defringing
    Prunet, Simon
    PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 2021, 133 (1029)
  • [29] LIGHTWEIGHT AND EFFICIENT END-TO-END SPEECH RECOGNITION USING LOW-RANK TRANSFORMER
    Winata, Genta Indra
    Cahyawijaya, Samuel
    Lin, Zhaojiang
    Liu, Zihan
    Fung, Pascale
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6144 - 6148
  • [30] Multimodal Medical Image Fusion Based on Multiple Latent Low-Rank Representation
    Lou, Xi-Cheng
    Feng, Xin
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021