MMMViT: Multiscale multimodal vision transformer for brain tumor segmentation with missing modalities

被引：2

作者：

Qiu, Chengjian ^{[1
]}

Song, Yuqing ^{[1
]}

Liu, Yi ^{[1
]}

Zhu, Yan ^{[2
]}

Han, Kai ^{[1
]}

Sheng, Victor S. ^{[3
]}

Liu, Zhe ^{[1
]}

机构：

[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Peoples R China

[2] Jiangsu Univ, Affiliated Hosp, Dept Radiol, Zhenjiang 212001, Peoples R China

[3] Texas Tech Univ, Dept Comp Sci, Lubbock, TX USA

来源：

BIOMEDICAL SIGNAL PROCESSING AND CONTROL | 2024年 / 90卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Brain tumor segmentation; Missing modalities; Global multiscale features; Correlation across modalities;

D O I：

10.1016/j.bspc.2023.105827

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Accurate segmentation of brain tumors from multimodal MRI sequences is a critical prerequisite for brain tumor diagnosis, prognosis, and surgical treatment. While one or more modalities are often missing in clinical practice, which can collapse most previous methods that rely on all modality data. To deal with this problem, the current state-of-the-art Transformer-related approach directly fuses available modality-specific features to learn a shared latent representation, with the aim of extracting common features that are robust to any combinatorial subset of all modalities. However, it is not trivial to directly learn a shared latent representation due to the diversity of combinatorial subsets of all modalities. Furthermore, correlations across modalities as well as global multiscale features are not exploited in this Transformer-related approach. In this work, we propose a Multiscale Multimodal Vision Transformer (MMMViT), which not only leverages correlations across modalities to decouple the direct fusing procedure into two simple steps but also innovatively fuses local multiscale features as the input of the intra-modal Transformer block to implicitly obtain the global multiscale features to adapt to brain tumors of various sizes. We experiment on the BraTs 2018 dataset for all modalities and various missing-modalities as input, and the results demonstrate that the proposed method achieves the state-of-the-art performance. Code is available at: https://github.com/qiuchengjian/MMMViT.

引用

页数：10

共 38 条

[1] Cross-modality synthesis from CT to PET using FCN and GAN networks for improved automated lesion detection
Ben-Cohen, Avi
Klang, Eyal
Raskin, Stephen P.
Soffer, Shelly
Ben-Haim, Simona
Konen, Eli
Amitai, Michal Marianne
Greenspan, Hayit
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 78 : 186 - 194
[2] Learning With Privileged Multimodal Knowledge for Unimodal Segmentation
Chen, Cheng
Dou, Qi
Jin, Yueming
Liu, Quande
Heng, Pheng Ann
[J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (03) : 621 - 632
[3] Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion
Chen, Cheng
Dou, Qi
Jin, Yueming
Chen, Hao
Qin, Jing
Pheng-Ann Heng
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT III, 2019, 11766 : 447 - 456
[4] Cicek Ozgun, 2016, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. 19th International Conference. Proceedings: LNCS 9901, P424, DOI 10.1007/978-3-319-46723-8_49
[5] MVFusFra: A Multi-View Dynamic Fusion Framework for Multimodal Brain Tumor Segmentation
Ding, Yi
Zheng, Wei
Geng, Ji
Qin, Zhen
Choo, Kim-Kwang Raymond
Qin, Zhiguang
Hou, Xiaolin
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (04) : 1570 - 1581
[6] Hetero-Modal Variational Encoder-Decoder for Joint Modality Completion and Segmentation
Dorent, Reuben
Joutard, Samuel
Modat, Marc
Ourselin, Sebastien
Vercauteren, Tom
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT II, 2019, 11765 : 74 - 82
[7] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[8] Multiscale Vision Transformers
Fan, Haoqi
Xiong, Bo
Mangalam, Karttikeya
Li, Yanghao
Yan, Zhicheng
Malik, Jitendra
Feichtenhofer, Christoph
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6804 - 6815
[9] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
Ghiasi, Golnaz
Lin, Tsung-Yi
Le, Quoc V.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7029 - 7038
[10] Generative Adversarial Networks
Goodfellow, Ian
Pouget-Abadie, Jean
Mirza, Mehdi
Xu, Bing
Warde-Farley, David
Ozair, Sherjil
Courville, Aaron
Bengio, Yoshua
[J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144

← 1 2 3 4 →