Learning multi-axis representation in frequency domain for medical image segmentation

被引:0
作者
Ruan, Jiacheng [1 ]
Gao, Jingsheng [1 ]
Xie, Mingye [1 ]
Xiang, Suncheng [2 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Biomed Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Medical image segmentation; Attention mechanism; Frequency domain information; U-NET;
D O I
10.1007/s10994-024-06728-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain, which is generated by our External Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets, and our approach demonstrates competitive performance, owing to its effective utilization of frequency domain information.
引用
收藏
页数:15
相关论文
共 46 条
  • [1] Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved?
    Bernard, Olivier
    Lalande, Alain
    Zotti, Clement
    Cervenansky, Frederick
    Yang, Xin
    Heng, Pheng-Ann
    Cetin, Irem
    Lekadir, Karim
    Camara, Oscar
    Gonzalez Ballester, Miguel Angel
    Sanroma, Gerard
    Napel, Sandy
    Petersen, Steffen
    Tziritas, Georgios
    Grinias, Elias
    Khened, Mahendra
    Kollerathu, Varghese Alex
    Krishnamurthi, Ganapathy
    Rohe, Marc-Michel
    Pennec, Xavier
    Sermesant, Maxime
    Isensee, Fabian
    Jaeger, Paul
    Maier-Hein, Klaus H.
    Full, Peter M.
    Wolf, Ivo
    Engelhardt, Sandy
    Baumgartner, Christian F.
    Koch, Lisa M.
    Wolterink, Jelmer M.
    Isgum, Ivana
    Jang, Yeonggul
    Hong, Yoonmi
    Patravali, Jay
    Jain, Shubham
    Humbert, Olivier
    Jodoin, Pierre-Marc
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2018, 37 (11) : 2514 - 2525
  • [2] Berseth M, 2017, Arxiv, DOI [arXiv:1703.00523, DOI 10.48550/ARXIV.1703.00523]
  • [3] Cao H., 2021, arXiv, DOI DOI 10.48550/ARXIV.2105.05537
  • [4] Chen J., 2021, arXiv, DOI [DOI 10.48550/ARXIV.2102.04306, 10.48550/arXiv.2102.04306]
  • [5] TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing
    Chen, Jierun
    He, Tianlang
    Zhuo, Weipeng
    Ma, Li
    Ha, Sangtae
    Chan, S-H Gary
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12538 - 12548
  • [6] Chen QZ, 2023, Arxiv, DOI arXiv:2303.15671
  • [7] Codella N, 2019, Arxiv, DOI [arXiv:1902.03368, DOI 10.48550/ARXIV.1902.03368]
  • [8] HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation
    Dolz, Jose
    Gopinath, Karthik
    Yuan, Jing
    Lombaert, Herve
    Desrosiers, Christian
    Ben Ayed, Ismail
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (05) : 1116 - 1126
  • [9] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [10] Duan H., 2024, Wearable-based behaviour interpolation for semi-supervised human activity recognition