A 3D hierarchical cross-modality interaction network using transformers and convolutions for brain glioma segmentation in MR images

被引:0
作者
Zhuang, Yuzhou [1 ]
Liu, Hong [1 ]
Fang, Wei [2 ]
Ma, Guangzhi [1 ]
Sun, Sisi [1 ]
Zhu, Yunfeng [1 ]
Zhang, Xu [3 ]
Ge, Chuanbin [3 ]
Chen, Wenyang [1 ]
Long, Jiaosong [4 ]
Song, Enmin [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China
[2] Wuhan Zhongke Ind Res Inst Med Sci Co Ltd, Wuhan, Peoples R China
[3] Wuhan United Imaging Healthcare Surg Technol Co Lt, Wuhan, Peoples R China
[4] Hubei Univ Technol, Sch Art & Design, Wuhan, Peoples R China
关键词
brain glioma segmentation; contextual information learning; MR images; multi-modal feature fusion; vision transformer; TUMOR SEGMENTATION; ATTENTION;
D O I
10.1002/mp.17354
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
BackgroundPrecise glioma segmentation from multi-parametric magnetic resonance (MR) images is essential for brain glioma diagnosis. However, due to the indistinct boundaries between tumor sub-regions and the heterogeneous appearances of gliomas in volumetric MR scans, designing a reliable and automated glioma segmentation method is still challenging. Although existing 3D Transformer-based or convolution-based segmentation networks have obtained promising results via multi-modal feature fusion strategies or contextual learning methods, they widely lack the capability of hierarchical interactions between different modalities and cannot effectively learn comprehensive feature representations related to all glioma sub-regions.PurposeTo overcome these problems, in this paper, we propose a 3D hierarchical cross-modality interaction network (HCMINet) using Transformers and convolutions for accurate multi-modal glioma segmentation, which leverages an effective hierarchical cross-modality interaction strategy to sufficiently learn modality-specific and modality-shared knowledge correlated to glioma sub-region segmentation from multi-parametric MR images.MethodsIn the HCMINet, we first design a hierarchical cross-modality interaction Transformer (HCMITrans) encoder to hierarchically encode and fuse heterogeneous multi-modal features by Transformer-based intra-modal embeddings and inter-modal interactions in multiple encoding stages, which effectively captures complex cross-modality correlations while modeling global contexts. Then, we collaborate an HCMITrans encoder with a modality-shared convolutional encoder to construct the dual-encoder architecture in the encoding stage, which can learn the abundant contextual information from global and local perspectives. Finally, in the decoding stage, we present a progressive hybrid context fusion (PHCF) decoder to progressively fuse local and global features extracted by the dual-encoder architecture, which utilizes the local-global context fusion (LGCF) module to efficiently alleviate the contextual discrepancy among the decoding features.ResultsExtensive experiments are conducted on two public and competitive glioma benchmark datasets, including the BraTS2020 dataset with 494 patients and the BraTS2021 dataset with 1251 patients. Results show that our proposed method outperforms existing Transformer-based and CNN-based methods using other multi-modal fusion strategies in our experiments. Specifically, the proposed HCMINet achieves state-of-the-art mean DSC values of 85.33% and 91.09% on the BraTS2020 online validation dataset and the BraTS2021 local testing dataset, respectively.ConclusionsOur proposed method can accurately and automatically segment glioma regions from multi-parametric MR images, which is beneficial for the quantitative analysis of brain gliomas and helpful for reducing the annotation burden of neuroradiologists.
引用
收藏
页码:8371 / 8389
页数:19
相关论文
共 60 条
[1]  
Baid U., ARXIV
[2]   Data Descriptor: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features [J].
Bakas, Spyridon ;
Akbari, Hamed ;
Sotiras, Aristeidis ;
Bilello, Michel ;
Rozycki, Martin ;
Kirby, Justin S. ;
Freymann, John B. ;
Farahani, Keyvan ;
Davatzikos, Christos .
SCIENTIFIC DATA, 2017, 4
[3]   Clinical capability of modern brain tumor segmentation models [J].
Berkley, Adam ;
Saueressig, Camillo ;
Shukla, Utkarsh ;
Chowdhury, Imran ;
Munoz-Gauna, Anthony ;
Shehu, Olalekan ;
Singh, Ritambhara ;
Munbodh, Reshma .
MEDICAL PHYSICS, 2023, 50 (08) :4943-4959
[4]   History and Evolution of Brain Tumor Imaging: Insights through Radiology [J].
Castillo, Mauricio .
RADIOLOGY, 2014, 273 (2S) :S111-S125
[5]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[6]   Quaternion Cross-Modality Spatial Learning for Multi-Modal Medical Image Segmentation [J].
Chen, Junyang ;
Huang, Guoheng ;
Yuan, Xiaochen ;
Zhong, Guo ;
Zheng, Zewen ;
Pun, Chi-Man ;
Zhu, Jian ;
Huang, Zhixin .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (03) :1412-1423
[7]  
Cicek O., 2016, INT C MED IM COMP CO, P424
[8]   Current and Emerging Trends in Medical Image Segmentation With Deep Learning [J].
Conze, Pierre-Henri ;
Andrade-Miranda, Gustavo ;
Singh, Vivek Kumar ;
Jaouen, Vincent ;
Visvikis, Dimitris .
IEEE TRANSACTIONS ON RADIATION AND PLASMA MEDICAL SCIENCES, 2023, 7 (06) :545-569
[9]   Attentional Feature Fusion [J].
Dai, Yimian ;
Gieseke, Fabian ;
Oehmcke, Stefan ;
Wu, Yiquan ;
Barnard, Kobus .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :3559-3568
[10]   RFNet: Region-aware Fusion Network for Incomplete Multi-modal Brain Tumor Segmentation [J].
Ding, Yuhang ;
Yu, Xin ;
Yang, Yi .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :3955-3964