HCTA-Net: A Hybrid CNN-Transformer Attention Network for Surgical Instrument Segmentation

被引:2
作者
Yang, Lei [1 ]
Wang, Hongyong [1 ]
Bian, Guibin [1 ,2 ]
Liu, Yanhong [1 ]
机构
[1] Zhengzhou Univ, Sch Elect Engn, Zhengzhou 450001, Henan, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
来源
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2023年 / 5卷 / 04期
基金
中国国家自然科学基金;
关键词
Image segmentation; Feature extraction; Instruments; Transformers; Task analysis; Surgery; Robots; Surgical instruments; Deep architecture; Medical robotics; surgical instrument segmentation; transformer; residual network; deep supervision; FEATURE AGGREGATION; IMAGES;
D O I
10.1109/TMRB.2023.3315479
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Surgical robots nowadays have an increasingly important role in surgery, and the accurate surgical instrument segmentation is one of important prerequisites for their stable operations. However, this task is against with some challenging factors, such as scaling transformation, specular reflection, etc. Recently, transformer has shown their superior segmentation performance in the field of image segmentation, which has a strong remote dependence detection capability. However, it could not well capture locality and translation invariance. In this paper, taking the advantages of transformer and CNN, a hybrid CNN-Transformer attention network, named HCTA-Net, is proposed for automatic surgical instrument segmentation. To be able to better extract more comprehensive feature information from surgical images, a dual-path encoding unit is proposed for effective feature representation of local detail feature and global contexts. Meanwhile, an attention-based feature enhancement (AFE) module is proposed for feature complementary of dual-path encoding networks. In addition, to mitigate the issue of limited processing capacity associated with simple connections, a multi-dimension attention (MDA) module is built to process the intermediate features from three directions, including width, height and space, to filter the interference features while emphasizing the key feature regions of local feature maps. Further, an additive attention enhancement (AAE) module is introduced for further feature enhancement of local feature maps. Finally, in order to obtain more multi-scale global information, a multi-scale context fusion (MCF) module is proposed at the bottleneck layer to obtain different receptive fields to enrich feature representation. Experimental results show that proposed HCTA-Net network can achieve superior segmentation performance on surgical instruments compared to other state-of-the-art (SOTA) segmentation models.
引用
收藏
页码:929 / 944
页数:16
相关论文
共 51 条
  • [11] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [12] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
  • [13] Densely Connected Convolutional Networks
    Huang, Gao
    Liu, Zhuang
    van der Maaten, Laurens
    Weinberger, Kilian Q.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2261 - 2269
  • [14] Huang X., 2021, arXiv
  • [15] Iglovikov V, 2018, Arxiv, DOI arXiv:1801.05746
  • [16] Image segmentation with pulse-coupled neural network and Canny operators
    Jiang, Wen
    Zhou, Haoyu
    Shen, Yue
    Liu, Bo
    Fu, Zigang
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2015, 46 : 528 - 538
  • [17] A geometric flow approach for region-based image segmentation-theoretical analysis
    Jing, Zhu-cui
    Ye, Juntao
    Xu, Guo-liang
    [J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2018, 34 (01): : 65 - 76
  • [18] Robust automatic segmentation of corneal layer boundaries in SDOCT images using graph theory and dynamic programming
    LaRocca, Francesco
    Chiu, Stephanie J.
    McNabb, Ryan P.
    Kuo, Anthony N.
    Izatt, Joseph A.
    Farsiu, Sina
    [J]. BIOMEDICAL OPTICS EXPRESS, 2011, 2 (06): : 1524 - 1538
  • [19] MAGF-Net: A multiscale attention-guided fusion network for retinal vessel
    Li, Jianyong
    Gao, Ge
    Liu, Yanhong
    Yang, Lei
    [J]. MEASUREMENT, 2023, 206
  • [20] Lightweight Attention Convolutional Neural Network for Retinal Vessel Image Segmentation
    Li, Xiang
    Jiang, Yuchen
    Li, Minglei
    Yin, Shen
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (03) : 1958 - 1967