HartleyMHA: Self-attention in Frequency Domain for Resolution-Robust and Parameter-Efficient 3D Image Segmentation

被引：0

作者：

Wong, Ken C. L. ^{[1
]}

Wang, Hongzhi ^{[1
]}

Syeda-Mahmood, Tanveer ^{[1
]}

机构：

[1] IBM Res, Almaden Res Ctr, San Jose, CA 95120 USA

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV | 2023年 / 14223卷

关键词：

Image segmentation; Transformer; Fourier neural operator; Hartley transform; Resolution-robust;

D O I：

10.1007/978-3-031-43901-8_35

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

With the introduction of Transformers, different attention-based models have been proposed for image segmentation with promising results. Although self-attention allows capturing of long-range dependencies, it suffers from a quadratic complexity in the image size especially in 3D. To avoid the out-of-memory error during training, input size reduction is usually required for 3D segmentation, but the accuracy can be suboptimal when the trained models are applied on the original image size. To address this limitation, inspired by the Fourier neural operator (FNO), we introduce the HartleyMHA model which is robust to training image resolution with efficient self-attention. FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the appealing properties of zero-shot super-resolution and global receptive field. We modify the FNO by using the Hartley transform with shared parameters to reduce the model size by orders of magnitude, and this allows us to further apply self-attention in the frequency domain for more expressive high-order feature combination with improved efficiency. When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters.

引用

页码：364 / 373

页数：10

共 27 条

[1]

Ba JL, 2016, arXiv

[2]

Bakas S, 2019, Arxiv, DOI [arXiv:1811.02629, 10.48550/arXiv.1811.02629, DOI 10.48550/ARXIV.1811.02629]

[3] Data Descriptor: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features [J].

Bakas, Spyridon ;

Akbari, Hamed ;

Sotiras, Aristeidis ;

Bilello, Michel ;

Rozycki, Martin ;

Kirby, Justin S. ;

Freymann, John B. ;

Farahani, Keyvan ;

Davatzikos, Christos .

SCIENTIFIC DATA, 2017, 4

[4] DISCRETE HARTLEY TRANSFORM [J].

BRACEWELL, RN .

JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1983, 73 (12) :1832-1835

[5]

Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9

[6]

Dosovitskiy A., 2020, PREPRINT

[7] UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [J].

Gao, Yunhe ;

Zhou, Mu ;

Metaxas, Dimitris N. .

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT III, 2021, 12903 :61-71

[8]

Hartley R. V. L., 1942, Proc. ZRE, V30, P144, DOI 10.1109/JRPROC.1942.234333

[9] UNETR: Transformers for 3D Medical Image Segmentation [J].

Hatamizadeh, Ali ;

Tang, Yucheng ;

Nath, Vishwesh ;

Yang, Dong ;

Myronenko, Andriy ;

Landman, Bennett ;

Roth, Holger R. ;

Xu, Daguang .

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, :1748-1758

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 →