Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

被引：1

作者：

Wang, Hongqiu ^{[1
]}

Yang, Guang ^{[2
]}

Zhang, Shichen ^{[1
]}

Qin, Jing ^{[3
]}

Guo, Yike ^{[4
,5
,6
]}

Xu, Bo ^{[7
]}

Jin, Yueming ^{[8
,9
]}

Zhu, Lei ^{[1
,6
,10
]}

机构：

[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst ROAS Thrust, Syst Hub, Guangzhou 511400, Peoples R China

[2] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England

[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Hong Kong, Peoples R China

[4] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, London SW7 2AZ, England

[5] Imperial Coll London, Dept Comp Sci, London SW7 2AZ, England

[6] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[7] Gen Hosp Southern Theatre Command PLA, Dept Anesthesiol, Guangzhou 510010, Peoples R China

[8] Natl Univ Singapore, Dept Biomed Engn, Singapore 119077, Singapore

[9] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore

[10] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2024年 / 43卷 / 12期

关键词：

Instruments; Surgery; Image segmentation; Robots; Task analysis; Visualization; Accuracy; Robotic-assisted surgery; instrument segmentation; referring video object segmentation; video-language learning;

D O I：

10.1109/TMI.2024.3426953

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (https://github.com/whq-xxh/RSVIS).

引用

页码：4457 / 4469

页数：13

共 50 条

[21] Attention-Guided Network for Semantic Video Segmentation
Li, Jiangyun
Zhao, Yikai
Fu, Jun
Wu, Jiajia
Liu, Jing
IEEE ACCESS, 2019, 7 : 140680 - 140689
[22] Instrument Detection and Descriptive Gesture Segmentation on a Robotic Surgical Maneuvers Dataset
Rivas-Blanco, Irene
Lopez-Casado, Carmen
Herrera-Lopez, Juan M.
Cabrera-Villa, Jose
Perez-del-Pulgar, Carlos J.
APPLIED SCIENCES-BASEL, 2024, 14 (09):
[23] LEVERAGING VISUAL PROMPTS TO GUIDE LANGUAGE MODELING FOR REFERRING VIDEO OBJECT SEGMENTATION
Gao, Qiqi
Zhong, Wanjun
Li, Jie
Zhao, Tiejun
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 685 - 689
[24] Needle Segmentation Using GAN: Restoring Thin Instrument Visibility in Robotic Ultrasound
Jiang, Zhongliang
Li, Xuesong
Chu, Xiangyu
Karlas, Angelos
Bi, Yuan
Cheng, Yingsheng
Samuel Au, K. W.
Navab, Nassir
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
[25] Mamba-driven hierarchical temporal multimodal alignment for referring video object segmentation
Liang, Le
Zhang, Lefei
NEUROCOMPUTING, 2025, 622
[26] Self Supervised Progressive Network for High Performance Video Object Segmentation
Li, Guorong
Hong, Dexiang
Xu, Kai
Zhong, Bineng
Su, Li
Han, Zhenjun
Huang, Qingming
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 7671 - 7684
[27] TMA-Net: A Transformer-Based Multi-Scale Attention Network for Surgical Instrument Segmentation
Yang, Lei
Wang, Hongyong
Gu, Yuge
Bian, Guibin
Liu, Yanhong
Yu, Hongnian
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2023, 5 (02): : 323 - 334
[28] Motion-Guided Cascaded Refinement Network for Video Object Segmentation
Hu, Ping
Wang, Gang
Kong, Xiangfei
Kuen, Jason
Tan, Yap-Peng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (08) : 1957 - 1967
[29] Guided Co-Segmentation Network for Fast Video Object Segmentation
Liu, Weide
Lin, Guosheng
Zhang, Tianyi
Liu, Zichuan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1607 - 1617
[30] EyeLS: Shadow-Guided Instrument Landing System for Target Approaching in Robotic Eye Surgery
Yang, Junjie
Zhao, Zhihao
Shen, Siyuan
Zapp, Daniel
Maier, Mathias
Huang, Kai
Navab, Nassir
Nasseri, M. Ali
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3664 - 3671

← 1 2 3 4 5 →