Desmoking of the Endoscopic Surgery Images Based on a Local-Global U-Shaped Transformer Model

被引:0
作者
Wang, Wanqing [1 ]
Liu, Fucheng [1 ]
Hao, Jianxiong [1 ]
Yu, Xiangyang [2 ]
Zhang, Bo [3 ]
Shi, Chaoyang [1 ]
机构
[1] Tianjin Univ, Minist Educ, Sch Mech Engn, Key Lab Mech Theory & Equipment Design, Tianjin 300072, Peoples R China
[2] Tianjin Hosp, Dept Gastrointestinal Surg, Integrated Tradit Chinese & Western Med, Tianjin 300072, Peoples R China
[3] Waseda Univ, Future Robot Org, Tokyo 1620044, Japan
来源
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS | 2025年 / 7卷 / 01期
基金
中国国家自然科学基金;
关键词
Transformers; Atmospheric modeling; Robots; Instruments; Adaptation models; Biological system modeling; Visualization; Minimally invasive surgery; Medical robotics; Image segmentation; Desmoking; endoscope; surgical smoke; transformer; deep learning; robot-assisted minimally invasive surgery;
D O I
10.1109/TMRB.2024.3517139
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In robot-assisted minimally invasive surgery (RMIS), the smoke generated by energy-based surgical instruments blurs and obstructs the endoscopic surgical field, which increases the difficulty and risk of robotic surgery. However, current desmoking research primarily focuses on natural weather conditions, with limited studies addressing desmoking techniques for endoscopic images. Furthermore, surgical smoke presents a notably intricate morphology, and research efforts aimed at uniform, non-uniform, thin, and dense smoke remain relatively limited. This work proposes a Local-Global U-Shaped Transformer Model (LGUformer) based on the U-Net and Transformer architectures to remove complex smoke from endoscopic images. By introducing a local-global multi-head self-attention mechanism and multi-scale depthwise convolution, the proposed model enhances the inference capability. An enhanced feature map fusion method improves the quality of reconstructed images. The improved modules enable efficient handling of variable smoke while generating superior-quality images. Through desmoking experiments on synthetic and real smoke images, the LGUformer model demonstrated superior performance compared with seven other desmoking models in terms of accuracy, clarity, absence of distortion, and robustness. A task-based surgical instrument segmentation experiment indicated the potential of this model as a pre-processing step in visual tasks. Finally, an ablation study was conducted to verify the advantages of the proposed modules.
引用
收藏
页码:254 / 265
页数:12
相关论文
共 1 条
[1]  
Attanasio A., 2022, Annu.Rev.Control., DOI [10.1109/TMRB.2022.3142361n, DOI 10.1109/TMRB.2022.3142361N]