Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

被引:1
|
作者
Wang, Hongqiu [1 ]
Yang, Guang [2 ]
Zhang, Shichen [1 ]
Qin, Jing [3 ]
Guo, Yike [4 ,5 ,6 ]
Xu, Bo [7 ]
Jin, Yueming [8 ,9 ]
Zhu, Lei [1 ,6 ,10 ]
机构
[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst ROAS Thrust, Syst Hub, Guangzhou 511400, Peoples R China
[2] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England
[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Hong Kong, Peoples R China
[4] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, London SW7 2AZ, England
[5] Imperial Coll London, Dept Comp Sci, London SW7 2AZ, England
[6] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[7] Gen Hosp Southern Theatre Command PLA, Dept Anesthesiol, Guangzhou 510010, Peoples R China
[8] Natl Univ Singapore, Dept Biomed Engn, Singapore 119077, Singapore
[9] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[10] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
关键词
Instruments; Surgery; Image segmentation; Robots; Task analysis; Visualization; Accuracy; Robotic-assisted surgery; instrument segmentation; referring video object segmentation; video-language learning;
D O I
10.1109/TMI.2024.3426953
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (https://github.com/whq-xxh/RSVIS).
引用
收藏
页码:4457 / 4469
页数:13
相关论文
共 50 条
  • [1] Branch Aggregation Attention Network for Robotic Surgical Instrument Segmentation
    Shen, Wenting
    Wang, Yaonan
    Liu, Min
    Wang, Jiazheng
    Ding, Renjie
    Zhang, Zhe
    Meijering, Erik
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (11) : 3408 - 3419
  • [2] Object-Agnostic Transformers for Video Referring Segmentation
    Yang, Xu
    Wang, Hao
    Xie, De
    Deng, Cheng
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2839 - 2849
  • [3] Automatic Sinus Surgery Skill Assessment Based on Instrument Segmentation and Tracking in Endoscopic Video
    Lin, Shan
    Qin, Fangbo
    Bly, Randall A.
    Moe, Kris S.
    Hannaford, Blake
    MULTISCALE MULTIMODAL MEDICAL IMAGING, MMMI 2019, 2020, 11977 : 93 - 100
  • [4] Pixel-Wise Contrastive Learning for Multi-Class Instrument Segmentation in Endoscopic Robotic Surgery Videos Using Dataset-Wide Sample Queues
    Sun, Liping
    Chen, Xiong
    IEEE ACCESS, 2024, 12 : 156867 - 156877
  • [5] Robotic Instrument Segmentation With Image-to-Image Translation
    Colleoni, Emanuele
    Stoyanov, Danail
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 935 - 942
  • [6] SAMSurg: Surgical Instrument Segmentation in Robotic Surgeries Using Vision Foundation Model
    Matasyoh, Nevin M.
    Mathis-Ullrich, Franziska
    Zeineldin, Ramy A.
    IEEE ACCESS, 2024, 12 : 193950 - 193959
  • [7] Surgical-DeSAM: decoupling SAM for instrument segmentation in robotic surgery
    Sheng, Yuyang
    Bano, Sophia
    Clarkson, Matthew J.
    Islam, Mobarakol
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2024, 19 (07) : 1267 - 1271
  • [8] HCTA-Net: A Hybrid CNN-Transformer Attention Network for Surgical Instrument Segmentation
    Yang, Lei
    Wang, Hongyong
    Bian, Guibin
    Liu, Yanhong
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2023, 5 (04): : 929 - 944
  • [9] A Dual-Branch Fusion Network for Surgical Instrument Segmentation
    Yang, Lei
    Zhai, Chenxu
    Wang, Hongyong
    Liu, Yanhong
    Bian, Guibin
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (04): : 1542 - 1554
  • [10] Instrument-Tissue Interaction Detection Framework for Surgical Video Understanding
    Lin, Wenjun
    Hu, Yan
    Fu, Huazhu
    Yang, Mingming
    Chng, Chin-Boon
    Kawasaki, Ryo
    Chui, Cheekong
    Liu, Jiang
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (08) : 2803 - 2813