MULTI-SCALE CONVOLUTION-TRANSFORMER FUSION NETWORK FOR ENDOSCOPIC IMAGE SEGMENTATION

被引:1
作者
Zou, Baosheng [1 ,2 ]
Zhou, Zongguang [3 ,4 ,5 ]
Han, Ying [6 ]
Li, Kang [1 ]
Wang, Guotai [2 ,7 ]
机构
[1] Sichuan Univ, West China Biomed Big Data Ctr, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu, Peoples R China
[3] Sichuan Univ, Div Gastrointestinal Surg, Dept Gen Surg, West China Hosp, Chengdu, Peoples R China
[4] Sichuan Univ, Inst Digest Surg, Chengdu, Peoples R China
[5] Sichuan Univ, State Key Lab Biotherapy, Chengdu, Peoples R China
[6] Sichuan Univ, West China Med Simulat Ctr, West China Hosp, Chengdu, Peoples R China
[7] Shanghai AI Lab, Shanghai, Peoples R China
来源
2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI | 2023年
基金
中国国家自然科学基金;
关键词
Medical image segmentation; Transformer; Endoscopic image; Image guided surgery;
D O I
10.1109/ISBI53787.2023.10230738
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic semantic segmentation of endoscopic images is an essential part of computer-assisted intervention surgery. Recently, Convolutional Neural Networks (CNNs) have been widely applied to endoscopic image segmentation, but their performance is still limited due to the weak ability to capture global long-range dependencies. This paper proposes a model that combines CNN and Transformer to deal with this problem, and it is named as Multi-scale Convolution-Transformer Fusion Network (MCTFNet) and consists of three components: 1) Multiple-parallel Multi-scale Transformer Convolution (MMTC) modules in parallel branches to extract Multi-scale information, 2) Multi-scale Information Fusion (MIF) module that fuses parallel branch information to allow interaction between different resolutions and 3) High-resolution Information Processing (HIP) module to keep high-resolution features in the image and avoid loss of details. We verified our method on HeiSurF Dataset, and the results show that our method achieved an average Dice of 80.07%, which outperformed state-of-the-art CNNs including HRNet (79.93%) and DeepLabv3 (78.34%). It also outperformed several networks designed for medical image segmentation.
引用
收藏
页数:5
相关论文
共 19 条
[1]  
Cao Hu, 2021, arXiv
[2]  
Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
[3]  
Dosovitskiy A., 2021, INT C LEARNING REPRE, P1
[4]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149
[5]   CE-Net: Context Encoder Network for 2D Medical Image Segmentation [J].
Gu, Zaiwang ;
Cheng, Jun ;
Fu, Huazhu ;
Zhou, Kang ;
Hao, Huaying ;
Zhao, Yitian ;
Zhang, Tianyang ;
Gao, Shenghua ;
Liu, Jiang .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (10) :2281-2292
[6]   CCNet: Criss-Cross Attention for Semantic Segmentation [J].
Huang, Zilong ;
Wang, Xinggang ;
Huang, Lichao ;
Huang, Chang ;
Wei, Yunchao ;
Liu, Wenyu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :603-612
[7]   nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation [J].
Isensee, Fabian ;
Jaeger, Paul F. ;
Kohl, Simon A. A. ;
Petersen, Jens ;
Maier-Hein, Klaus H. .
NATURE METHODS, 2021, 18 (02) :203-+
[8]  
Jia X., 2021, Advances in Artificial Intelligence, Computation, and Data Science: For Medicine and Life Science, P271, DOI [10.1007/978-3-030-69951-211, DOI 10.1007/978-3-030-69951-211]
[9]   Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].
Liu, Ze ;
Lin, Yutong ;
Cao, Yue ;
Hu, Han ;
Wei, Yixuan ;
Zhang, Zheng ;
Lin, Stephen ;
Guo, Baining .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002
[10]  
Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965