QT-UNet: A Self-Supervised Self-Querying All-Transformer U-Net for 3D Segmentation

被引:4
作者
Haversen, Andreas Hammer [1 ]
Bavirisetti, Durga Prasad [1 ]
Kiss, Gabriel Hanssen [1 ]
Lindseth, Frank [1 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp Sci, N-7034 Trondheim, Norway
关键词
Decoding; Computational modeling; Power transformers; Three-dimensional displays; Microwave integrated circuits; Image segmentation; Brain modeling; Deep learning; Encoding; Biomedical imaging; Self-supervised learning; encoder-decoder cross-attention; UNet; medical image segmentation; self-supervised learning; Swin Transformer; vision transformer;
D O I
10.1109/ACCESS.2024.3395058
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With reliable performance, and linear time complexity, Vision Transformers like the Swin Transformer are gaining popularity in the field of Medical Image Computing (MIC). Examples of effective volumetric segmentation models for brain tumours include VT-UNet, which combines conventional UNets with Swin Transformers using a unique encoder-decoder Cross-Attention (CA) paradigm. Self-Supervised Learning (SSL) has also experienced an increase in adoption in computer vision domains such as MIC, in situations where labelled training data is scarce. The Querying Transformer UNet (QT-UNet) model we introduce in this paper brings these advancements together. It is an all-Swin Transformer UNet with an encoder-decoder CA mechanism strengthened by SSL. For the purpose of evaluating the potential of QT-UNet as a generic volumetric segmentation model, it is subjected to extensive testing on several MIC datasets. Our best model achieves a Dice score of 88.61 on average and a Hausdorff Distance of 4.85mm making it competitive with State of the Art in Brain Tumour Segmentation (BraTS) 2021, using 40% fewer FLOPs than the baseline VT-UNet. We found poor results with Beyond The Cranial Vault (BTCV) and Medical Segmentation Decathlon (MSD), but validate the effectiveness of our new CA mechanism and find that the SSL pipeline is most effective when pre-trained with our CT-SSL dataset. The code be can found at https://github.com/AndreasHaaversen/QT-UNet.
引用
收藏
页码:62664 / 62676
页数:13
相关论文
共 44 条
  • [1] [Anonymous], PyTorch Lightning (1.4)
  • [2] [Anonymous], ANACONDA SOFTWARE DI
  • [3] Armato III S.G., 2015, Data from lidc-idri. the cancer imaging archive, V9
  • [4] Brown TB, 2020, ADV NEUR IN, V33
  • [5] Cao H., 2021, arXiv
  • [6] Cardoso M.J., 2022, arXiv, DOI DOI 10.48550/ARXIV.2211.02701
  • [7] Chen T., 2020, P 37 INT C MACHINE, P1597
  • [8] Chu XX, 2021, ADV NEUR IN
  • [9] The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository
    Clark, Kenneth
    Vendt, Bruce
    Smith, Kirk
    Freymann, John
    Kirby, Justin
    Koppel, Paul
    Moore, Stephen
    Phillips, Stanley
    Maffitt, David
    Pringle, Michael
    Tarbox, Lawrence
    Prior, Fred
    [J]. JOURNAL OF DIGITAL IMAGING, 2013, 26 (06) : 1045 - 1057
  • [10] Desai S., 2020, Chest Imaging With Clinical and Genomic Correlates Representing a Rural COVID-19 Positive Population