MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets

被引：4

作者：

Du, Siyi ^{[1
]}

Bayasi, Nourhan ^{[1
]}

Hamarneh, Ghassan ^{[2
]}

Garbi, Rafeef ^{[1
]}

机构：

[1] Univ British Columbia, Vancouver, BC, Canada

[2] Simon Fraser Univ, Burnaby, BC, Canada

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV | 2023年 / 14223卷

关键词：

Vision Transformer; Data-efficiency; Multi-domain Learning; Medical Image Segmentation; Dermatology; NET;

D O I：

10.1007/978-3-031-43901-8_43

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Despite its clinical utility, medical image segmentation (MIS) remains a daunting task due to images' inherent complexity and variability. Vision transformers (ViTs) have recently emerged as a promising solution to improve MIS; however, they require larger training datasets than convolutional neural networks. To overcome this obstacle, data-efficient ViTs were proposed, but they are typically trained using a single source of data, which overlooks the valuable knowledge that could be leveraged from other available datasets. Naivly combining datasets from different domains can result in negative knowledge transfer (NKT), i.e., a decrease in model performance on some domains with non-negligible inter-domain heterogeneity. In this paper, we propose MDViT, the first multi-domain ViT that includes domain adapters to mitigate data-hunger and combat NKT by adaptively exploiting knowledge in multiple small data resources (domains). Further, to enhance representation learning across domains, we integrate a mutual knowledge distillation paradigm that transfers knowledge between a universal network (spanning all the domains) and auxiliary domain-specific network branches. Experiments on 4 skin lesion segmentation datasets show that MDViT outperforms state-of-the-art algorithms, with superior segmentation performance and a fixed model size, at inference time, even as more domains are added. Our code is available at https://github.com/siyi-wind/MDViT.

引用

页码：448 / 458

页数：11

共 50 条

[1] Grouped multi-scale vision transformer for medical image segmentation
Zexuan Ji
Zheng Chen
Xiao Ma
Scientific Reports, 15 (1)
[2] Automatic Medical Image Segmentation with Vision Transformer
Zhang, Jie
Li, Fan
Zhang, Xin
Wang, Huaijun
Hei, Xinhong
APPLIED SCIENCES-BASEL, 2024, 14 (07):
[3] Realistic image normalization for multi-Domain segmentation
Delisle, Pierre-Luc
Anctil-Robitaille, Benoit
Desrosiers, Christian
Lombaert, Herve
MEDICAL IMAGE ANALYSIS, 2021, 74
[4] Multi-domain Multi-definition Landmark Localization for Small Datasets
Ferman, David
Bharaj, Gaurav
COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 646 - 663
[5] LViT: Language Meets Vision Transformer in Medical Image Segmentation
Li, Zihan
Li, Yunxiang
Li, Qingde
Wang, Puyang
Guo, Dazhou
Lu, Le
Jin, Dakai
Zhang, You
Hong, Qingqi
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (01) : 96 - 107
[6] Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation
Rahman, Md Mostafijur
Marculescu, Radu
MEDICAL IMAGING WITH DEEP LEARNING, VOL 227, 2023, 227 : 1526 - 1544
[7] MDL-NAS: A Joint Multi-domain Learning Framework for Vision Transformer
Wang, Shiguang
Xie, Tao
Cheng, Jian
Zhang, Xingcheng
Liu, Haijun
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20094 - 20104
[8] Hierarchical Self-supervised Learning for Medical Image Segmentation Based on Multi-domain Data Aggregation
Zheng, Hao
Han, Jun
Wang, Hongxiao
Yang, Lin
Zhao, Zhuo
Wang, Chaoli
Chen, Danny Z.
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 : 622 - 632
[9] MetaSwin: a unified meta vision transformer model for medical image segmentation
Lee, Soyeon
Lee, Minhyeok
PEERJ COMPUTER SCIENCE, 2024, 10 : 1 - 17
[10] MetaSwin: a unified meta vision transformer model for medical image segmentation
Lee, Soyeon
Lee, Minhyeok
PeerJ Computer Science, 2024, 10 : 1 - 17

← 1 2 3 4 5 →