SMTF: Sparse transformer with multiscale contextual fusion for medical image segmentation

被引:8
作者
Zhang, Xichu [1 ]
Zhang, Xiaozhi [1 ]
Ouyang, Lijun [2 ]
Qin, Chuanbo [3 ]
Xiao, Lin [4 ]
Xiong, Dongping [2 ]
机构
[1] Univ South China, Sch Elect Engn, Hengyang 421001, Peoples R China
[2] Univ South China, Sch Comp Software, Hengyang 421001, Peoples R China
[3] Wuyi Univ, Fac Intelligent Mfg, Jiangmen 529020, Peoples R China
[4] Jiangmen Cent Hosp, Radiotherapy Ctr, Jiangmen 529020, Peoples R China
基金
中国国家自然科学基金;
关键词
Medical image segmentation; Sparse attention; Deep supervision; Transformer; ATTENTION;
D O I
10.1016/j.bspc.2023.105458
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Medical image segmentation aims at recognizing the object of interest from surrounding tissues and structures, which is essential for the reliable diagnosis and morphological analysis of specific lesions. Automatic medical image segmentation has been significantly boosted by deep Convolutional Neural Networks (CNNs). However, CNNs usually fail to model long-range interactions due to the intrinsic locality of convolutional operations, which limits the segmentation performance. Recently, Transformer has been successfully applied in various computer visions, which leverages the self-attention mechanism for modelling long-range interactions to capture global information. Nevertheless, self-attention suffers from lacks of spatial locality and efficient computation. To address these issues, in this work, we develop a new sparse medical Transformer (SMTF) with multiscale contextual fusion for medical image segmentation. The proposed model combines convolutional operations and attention mechanisms to form a U-shaped framework to capture both local and global information. Specifically, to reduce the computational cost of traditional Transformer, we design a novel sparse attention module to construct Transformer layers by spherical Locality Sensitive Hashing method. The sparse attention partitions the feature space into different attention buckets, and the attention calculation is conducted only in the individual bucket. The designed sparse Transformer layer further incorporates a bottleneck block to construct the encoder in SMTF. It is worth noting that the proposed sparse Transformer can also aggregate the global feature information in early stages, which enables the model to learn more local and global information by incorporating CNNs at lower layers. Furthermore, we introduce a deep supervision strategy to guide the model to fuse multiscale feature information. It further enables the SMTF to effectively propagate feature information across layers to preserve more input spatial information and mitigate information attenuation. Benefiting from these, it can achieve better segmentation performance while being more robust and efficient. The proposed SMTF is evaluated on multiple medical image segmentation datasets and a clinical nasopharyngeal carcinoma dataset. Extensive experiments have demonstrated its superiority on both qualitative and quantitative evaluations. Code and models are available at https://github.com/qmx717/sparse-attention.git.
引用
收藏
页数:15
相关论文
共 56 条
[1]   TransDeepLab: Convolution-Free Transformer-Based DeepLab v3+for Medical Image Segmentation [J].
Azad, Reza ;
Heidari, Moein ;
Shariatnia, Moein ;
Aghdam, Ehsan Khodapanah ;
Karimijafarbigloo, Sanaz ;
Adeli, Ehsan ;
Merhof, Dorit .
PREDICTIVE INTELLIGENCE IN MEDICINE (PRIME 2022), 2022, 13564 :91-102
[2]   MA-Unet:An improved version of Unet based on multi-scale and attention mechanism for medical image segmentation [J].
Cai, Yutong ;
Wang, Yong .
THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167
[3]   Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl [J].
Caicedo, Juan C. ;
Goodman, Allen ;
Karhohs, Kyle W. ;
Cimini, Beth A. ;
Ackerman, Jeanelle ;
Haghighi, Marzieh ;
Heng, CherKeng ;
Becker, Tim ;
Minh Doan ;
McQuin, Claire ;
Rohban, Mohammad ;
Singh, Shantanu ;
Carpenter, Anne E. .
NATURE METHODS, 2019, 16 (12) :1247-+
[4]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[5]  
Chen J., 2021, arXiv
[6]  
Cicek Ozgun, 2016, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. 19th International Conference. Proceedings: LNCS 9901, P424, DOI 10.1007/978-3-319-46723-8_49
[7]  
Codella N, 2019, Arxiv, DOI arXiv:1902.03368
[8]  
Dai ZH, 2019, Arxiv, DOI [arXiv:1901.02860, DOI 10.48550/ARXIV.1901.02860]
[9]   PsLSNet: Automated psoriasis skin lesion segmentation using modified U-Net-based fully convolutional network [J].
Dash, Manoranjan ;
Londhe, Narendra D. ;
Ghosh, Subhojit ;
Semwal, Ashish ;
Sonawane, Rajendra S. .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2019, 52 :226-237
[10]   Image Super-Resolution Using Deep Convolutional Networks [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307