Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

被引:1
|
作者
Wang, Hongqiu [1 ]
Yang, Guang [2 ]
Zhang, Shichen [1 ]
Qin, Jing [3 ]
Guo, Yike [4 ,5 ,6 ]
Xu, Bo [7 ]
Jin, Yueming [8 ,9 ]
Zhu, Lei [1 ,6 ,10 ]
机构
[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst ROAS Thrust, Syst Hub, Guangzhou 511400, Peoples R China
[2] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England
[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Hong Kong, Peoples R China
[4] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, London SW7 2AZ, England
[5] Imperial Coll London, Dept Comp Sci, London SW7 2AZ, England
[6] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[7] Gen Hosp Southern Theatre Command PLA, Dept Anesthesiol, Guangzhou 510010, Peoples R China
[8] Natl Univ Singapore, Dept Biomed Engn, Singapore 119077, Singapore
[9] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[10] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
关键词
Instruments; Surgery; Image segmentation; Robots; Task analysis; Visualization; Accuracy; Robotic-assisted surgery; instrument segmentation; referring video object segmentation; video-language learning;
D O I
10.1109/TMI.2024.3426953
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (https://github.com/whq-xxh/RSVIS).
引用
收藏
页码:4457 / 4469
页数:13
相关论文
共 50 条
  • [21] Attention-Guided Network for Semantic Video Segmentation
    Li, Jiangyun
    Zhao, Yikai
    Fu, Jun
    Wu, Jiajia
    Liu, Jing
    IEEE ACCESS, 2019, 7 : 140680 - 140689
  • [22] Instrument Detection and Descriptive Gesture Segmentation on a Robotic Surgical Maneuvers Dataset
    Rivas-Blanco, Irene
    Lopez-Casado, Carmen
    Herrera-Lopez, Juan M.
    Cabrera-Villa, Jose
    Perez-del-Pulgar, Carlos J.
    APPLIED SCIENCES-BASEL, 2024, 14 (09):
  • [23] LEVERAGING VISUAL PROMPTS TO GUIDE LANGUAGE MODELING FOR REFERRING VIDEO OBJECT SEGMENTATION
    Gao, Qiqi
    Zhong, Wanjun
    Li, Jie
    Zhao, Tiejun
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 685 - 689
  • [24] Needle Segmentation Using GAN: Restoring Thin Instrument Visibility in Robotic Ultrasound
    Jiang, Zhongliang
    Li, Xuesong
    Chu, Xiangyu
    Karlas, Angelos
    Bi, Yuan
    Cheng, Yingsheng
    Samuel Au, K. W.
    Navab, Nassir
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [25] Mamba-driven hierarchical temporal multimodal alignment for referring video object segmentation
    Liang, Le
    Zhang, Lefei
    NEUROCOMPUTING, 2025, 622
  • [26] Self Supervised Progressive Network for High Performance Video Object Segmentation
    Li, Guorong
    Hong, Dexiang
    Xu, Kai
    Zhong, Bineng
    Su, Li
    Han, Zhenjun
    Huang, Qingming
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 7671 - 7684
  • [27] TMA-Net: A Transformer-Based Multi-Scale Attention Network for Surgical Instrument Segmentation
    Yang, Lei
    Wang, Hongyong
    Gu, Yuge
    Bian, Guibin
    Liu, Yanhong
    Yu, Hongnian
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2023, 5 (02): : 323 - 334
  • [28] Motion-Guided Cascaded Refinement Network for Video Object Segmentation
    Hu, Ping
    Wang, Gang
    Kong, Xiangfei
    Kuen, Jason
    Tan, Yap-Peng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (08) : 1957 - 1967
  • [29] Guided Co-Segmentation Network for Fast Video Object Segmentation
    Liu, Weide
    Lin, Guosheng
    Zhang, Tianyi
    Liu, Zichuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1607 - 1617
  • [30] EyeLS: Shadow-Guided Instrument Landing System for Target Approaching in Robotic Eye Surgery
    Yang, Junjie
    Zhao, Zhihao
    Shen, Siyuan
    Zapp, Daniel
    Maier, Mathias
    Huang, Kai
    Navab, Nassir
    Nasseri, M. Ali
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3664 - 3671