ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation

被引：37

作者：

Lin, Xian ^{[1
]}

Yan, Zengqiang ^{[1
]}

Deng, Xianbo ^{[2
]}

Zheng, Chuansheng ^{[2
]}

Yu, Li ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan, Peoples R China

[2] Huazhong Univ Sci & Technol, Union Hosp, Dept Radiol, Tongji Med Coll, Wuhan, Peoples R China

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV | 2023年 / 14223卷

基金：

中国国家自然科学基金;

关键词：

CNN-Style Transformers; Attention Collapse; Adaptive Self-Attention; Medical Image Segmentation; ATTENTION;

D O I：

10.1007/978-3-031-43901-8_61

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Transformers have been extensively studied in medical image segmentation to build pairwise long-range dependence. Yet, relatively limited well-annotated medical image data makes transformers struggle to extract diverse global features, resulting in attention collapse where attention maps become similar or even identical. Comparatively, convolutional neural networks (CNNs) have better convergence properties on small-scale training data but suffer from limited receptive fields. Existing works are dedicated to exploring the combinations of CNN and transformers while ignoring attention collapse, leaving the potential of transformers under-explored. In this paper, we propose to build CNN-style Transformers (ConvFormer) to promote better attention convergence and thus better segmentation performance. Specifically, ConvFormer consists of pooling, CNN-style self-attention (CSA), and convolutional feed-forward network (CFFN) corresponding to tokenization, self-attention, and feed-forward network in vanilla vision transformers. In contrast to positional embedding and tokenization, ConvFormer adopts 2D convolution and max-pooling for both position information preservation and feature size reduction. In this way, CSA takes 2D feature maps as inputs and establishes long-range dependency by constructing self-attention matrices as convolution kernels with adaptive sizes. Following CSA, 2D convolution is utilized for feature refinement through CFFN. Experimental results on multiple datasets demonstrate the effectiveness of ConvFormer working as a plug-and-play module for consistent performance improvement of transformer-based frameworks. Code is available at https://github.com/xianlin7/ConvFormer.

引用

页码：642 / 651

页数：10

共 28 条

[1] Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? [J].

Bernard, Olivier ;

Lalande, Alain ;

Zotti, Clement ;

Cervenansky, Frederick ;

Yang, Xin ;

Heng, Pheng-Ann ;

Cetin, Irem ;

Lekadir, Karim ;

Camara, Oscar ;

Gonzalez Ballester, Miguel Angel ;

Sanroma, Gerard ;

Napel, Sandy ;

Petersen, Steffen ;

Tziritas, Georgios ;

Grinias, Elias ;

Khened, Mahendra ;

Kollerathu, Varghese Alex ;

Krishnamurthi, Ganapathy ;

Rohe, Marc-Michel ;

Pennec, Xavier ;

Sermesant, Maxime ;

Isensee, Fabian ;

Jaeger, Paul ;

Maier-Hein, Klaus H. ;

Full, Peter M. ;

Wolf, Ivo ;

Engelhardt, Sandy ;

Baumgartner, Christian F. ;

Koch, Lisa M. ;

Wolterink, Jelmer M. ;

Isgum, Ivana ;

Jang, Yeonggul ;

Hong, Yoonmi ;

Patravali, Jay ;

Jain, Shubham ;

Humbert, Olivier ;

Jodoin, Pierre-Marc .

IEEE TRANSACTIONS ON MEDICAL IMAGING, 2018, 37 (11) :2514-2525

[2]

Cao Hu, 2021, arXiv

[3]

Chen BZ, 2022, Arxiv, DOI [arXiv:2107.05274, DOI 10.48550/ARXIV.2107.05274]

[4]

Chen J., 2021, arXiv

[5]

Codella N, 2019, Arxiv, DOI arXiv:1902.03368

[6] Ms RED: A novel multi-scale residual encoding and decoding network for skin lesion segmentation [J].

Dai, Duwei ;

Dong, Caixia ;

Xu, Songhua ;

Yan, Qingsen ;

Li, Zongfang ;

Zhang, Chunyan ;

Luo, Nana .

MEDICAL IMAGE ANALYSIS, 2022, 75

[7]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[8] Masked Autoencoders Are Scalable Vision Learners [J].

He, Kaiming ;

Chen, Xinlei ;

Xie, Saining ;

Li, Yanghao ;

Dollar, Piotr ;

Girshick, Ross .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988

[9]

Huang X., 2021, arXiv

[10] Convolution-Free Medical Image Segmentation Using Transformers [J].

Karimi, Davood ;

Vasylechko, Serge Didenko ;

Gholipour, Ali .

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 :78-88

← 1 2 3 →