QT-UNet: A Self-Supervised Self-Querying All-Transformer U-Net for 3D Segmentation

被引：4

作者：

Haversen, Andreas Hammer ^{[1
]}

Bavirisetti, Durga Prasad ^{[1
]}

Kiss, Gabriel Hanssen ^{[1
]}

Lindseth, Frank ^{[1
]}

机构：

[1] Norwegian Univ Sci & Technol, Dept Comp Sci, N-7034 Trondheim, Norway

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Decoding; Computational modeling; Power transformers; Three-dimensional displays; Microwave integrated circuits; Image segmentation; Brain modeling; Deep learning; Encoding; Biomedical imaging; Self-supervised learning; encoder-decoder cross-attention; UNet; medical image segmentation; self-supervised learning; Swin Transformer; vision transformer;

D O I：

10.1109/ACCESS.2024.3395058

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With reliable performance, and linear time complexity, Vision Transformers like the Swin Transformer are gaining popularity in the field of Medical Image Computing (MIC). Examples of effective volumetric segmentation models for brain tumours include VT-UNet, which combines conventional UNets with Swin Transformers using a unique encoder-decoder Cross-Attention (CA) paradigm. Self-Supervised Learning (SSL) has also experienced an increase in adoption in computer vision domains such as MIC, in situations where labelled training data is scarce. The Querying Transformer UNet (QT-UNet) model we introduce in this paper brings these advancements together. It is an all-Swin Transformer UNet with an encoder-decoder CA mechanism strengthened by SSL. For the purpose of evaluating the potential of QT-UNet as a generic volumetric segmentation model, it is subjected to extensive testing on several MIC datasets. Our best model achieves a Dice score of 88.61 on average and a Hausdorff Distance of 4.85mm making it competitive with State of the Art in Brain Tumour Segmentation (BraTS) 2021, using 40% fewer FLOPs than the baseline VT-UNet. We found poor results with Beyond The Cranial Vault (BTCV) and Medical Segmentation Decathlon (MSD), but validate the effectiveness of our new CA mechanism and find that the SSL pipeline is most effective when pre-trained with our CT-SSL dataset. The code be can found at https://github.com/AndreasHaaversen/QT-UNet.

引用

页码：62664 / 62676

页数：13

共 44 条

[1] [Anonymous], PyTorch Lightning (1.4)
[2] [Anonymous], ANACONDA SOFTWARE DI
[3] Armato III S.G., 2015, Data from lidc-idri. the cancer imaging archive, V9
[4] Brown TB, 2020, ADV NEUR IN, V33
[5] Cao H., 2021, arXiv
[6] Cardoso M.J., 2022, arXiv, DOI DOI 10.48550/ARXIV.2211.02701
[7] Chen T., 2020, P 37 INT C MACHINE, P1597
[8] Chu XX, 2021, ADV NEUR IN
[9] The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository
Clark, Kenneth
Vendt, Bruce
Smith, Kirk
Freymann, John
Kirby, Justin
Koppel, Paul
Moore, Stephen
Phillips, Stanley
Maffitt, David
Pringle, Michael
Tarbox, Lawrence
Prior, Fred
[J]. JOURNAL OF DIGITAL IMAGING, 2013, 26 (06) : 1045 - 1057
[10] Desai S., 2020, Chest Imaging With Clinical and Genomic Correlates Representing a Rural COVID-19 Positive Population

← 1 2 3 4 5 →