Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

被引:1
|
作者
Wang, Hongqiu [1 ]
Yang, Guang [2 ]
Zhang, Shichen [1 ]
Qin, Jing [3 ]
Guo, Yike [4 ,5 ,6 ]
Xu, Bo [7 ]
Jin, Yueming [8 ,9 ]
Zhu, Lei [1 ,6 ,10 ]
机构
[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst ROAS Thrust, Syst Hub, Guangzhou 511400, Peoples R China
[2] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England
[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Hong Kong, Peoples R China
[4] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, London SW7 2AZ, England
[5] Imperial Coll London, Dept Comp Sci, London SW7 2AZ, England
[6] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[7] Gen Hosp Southern Theatre Command PLA, Dept Anesthesiol, Guangzhou 510010, Peoples R China
[8] Natl Univ Singapore, Dept Biomed Engn, Singapore 119077, Singapore
[9] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[10] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
关键词
Instruments; Surgery; Image segmentation; Robots; Task analysis; Visualization; Accuracy; Robotic-assisted surgery; instrument segmentation; referring video object segmentation; video-language learning;
D O I
10.1109/TMI.2024.3426953
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (https://github.com/whq-xxh/RSVIS).
引用
收藏
页码:4457 / 4469
页数:13
相关论文
共 50 条
  • [31] Reducing Latency in a Converted Spiking Video Segmentation Network
    Cheni, Qinyu
    Rueckauer, Bodo
    Li, Li
    Delbruck, Tobi
    Liu, Shih-Chii
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [32] Anchor-Free Feature Aggregation Network for Instrument Detection in Endoscopic Surgery
    Ding, Guanzhi
    Zhao, Xiushun
    Peng, Cai
    Li, Li
    Guo, Jing
    Li, Depei
    Jiang, Xiaobing
    IEEE ACCESS, 2023, 11 : 29464 - 29473
  • [33] Weakly Supervised Referring Video Object Segmentation With Object-Centric Pseudo-Guidance
    Wang, Weikang
    Su, Yuting
    Liu, Jing
    Sun, Wei
    Zhai, Guangtao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1320 - 1333
  • [34] Design, Optimization, and Experimental Validation of a Handheld Nonconstant-Curvature Hybrid-Structure Robotic Instrument for Maxillary Sinus Surgery
    Wang, Xuchen
    Ma, Xin
    Zhu, Puchen
    Ng, Wee Shen
    Zhang, Huayu
    Xia, Xianfeng
    Taylor, Russell H.
    Au, Kwok Wai Samuel
    IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2024, 29 (04) : 3074 - 3082
  • [35] Automatic instrument segmentation in robot-assisted surgery using deep learning
    Shvets, Alexey A.
    Rakhlin, Alexander
    Kalinin, Alexandr A.
    Iglovikov, Vladimir I.
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 624 - 628
  • [36] Video-Based Method of Quantifying Performance and Instrument Motion During Simulated Phonosurgery
    Conroy, Ellen
    Surender, Ketan
    Geng, Zhixian
    Chen, Ting
    Dailey, Seth
    Jiang, Jack
    LARYNGOSCOPE, 2014, 124 (10): : 2332 - 2337
  • [37] MSDE-Net: A Multi-Scale Dual-Encoding Network for Surgical Instrument Segmentation
    Yang, Lei
    Gu, Yuge
    Bian, Guibin
    Liu, Yanhong
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (07) : 4072 - 4083
  • [38] Advantages and disadvantages of robotic and uniportal video-assisted thoracoscopic surgery
    Nagano, Hiromitsu
    Suda, Takashi
    VIDEO-ASSISTED THORACIC SURGERY, 2021, 6
  • [39] Video-based augmented reality combining CT-scan and instrument position data to microscope view in middle ear surgery
    Hussain, Raabid
    Lalande, Alain
    Marroquin, Roberto
    Guigou, Caroline
    Grayeli, Alexis Bozorg
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [40] Deep learning-based video-analysis of instrument motion in microvascular anastomosis training
    Sugiyama, Taku
    Sugimori, Hiroyuki
    Tang, Minghui
    Ito, Yasuhiro
    Gekka, Masayuki
    Uchino, Haruto
    Ito, Masaki
    Ogasawara, Katsuhiko
    Fujimura, Miki
    ACTA NEUROCHIRURGICA, 2024, 166 (01)