Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

被引:1
|
作者
Wang, Hongqiu [1 ]
Yang, Guang [2 ]
Zhang, Shichen [1 ]
Qin, Jing [3 ]
Guo, Yike [4 ,5 ,6 ]
Xu, Bo [7 ]
Jin, Yueming [8 ,9 ]
Zhu, Lei [1 ,6 ,10 ]
机构
[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst ROAS Thrust, Syst Hub, Guangzhou 511400, Peoples R China
[2] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England
[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Hong Kong, Peoples R China
[4] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, London SW7 2AZ, England
[5] Imperial Coll London, Dept Comp Sci, London SW7 2AZ, England
[6] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[7] Gen Hosp Southern Theatre Command PLA, Dept Anesthesiol, Guangzhou 510010, Peoples R China
[8] Natl Univ Singapore, Dept Biomed Engn, Singapore 119077, Singapore
[9] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[10] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
关键词
Instruments; Surgery; Image segmentation; Robots; Task analysis; Visualization; Accuracy; Robotic-assisted surgery; instrument segmentation; referring video object segmentation; video-language learning;
D O I
10.1109/TMI.2024.3426953
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (https://github.com/whq-xxh/RSVIS).
引用
收藏
页码:4457 / 4469
页数:13
相关论文
共 50 条
  • [41] Adaptive convolutional neural network for large change in video object segmentation
    Yin, Hui
    Yang, Lin
    Xu, Hongli
    Wan, Jin
    IET COMPUTER VISION, 2019, 13 (05) : 452 - 460
  • [42] Multi-Granularity Context Network for Efficient Video Semantic Segmentation
    Liang, Zhiyuan
    Dai, Xiangdong
    Wu, Yiqian
    Jin, Xiaogang
    Shen, Jianbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3163 - 3175
  • [43] Actor and Action Modular Network for Text-Based Video Segmentation
    Yang, Jianhua
    Huang, Yan
    Niu, Kai
    Huang, Linjiang
    Ma, Zhanyu
    Wang, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4474 - 4489
  • [44] MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation
    Zhou, Tianfei
    Li, Jianwu
    Wang, Shunzhou
    Tao, Ran
    Shen, Jianbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8326 - 8338
  • [45] FBG-Based Sensorized Surgical Instrument for Force Measurement in Minimally Invasive Robotic Surgery
    Liu, Qiao
    Dai, Yu
    Li, Mengwen
    Yao, Bin
    Zhang, Jianxun
    IEEE SENSORS JOURNAL, 2024, 24 (07) : 11450 - 11458
  • [46] Evaluation of Methods for Detection and Semantic Segmentation of the Anterior Capsulotomy in Cataract Surgery Video
    Zeng, Zixue
    Giap, Binh Duong
    Kahana, Ethan
    Lustre, Jefferson
    Mahmoud, Ossama
    Mian, Shahzad, I
    Tannen, Bradford
    Nallasamy, Nambi
    CLINICAL OPHTHALMOLOGY, 2024, 18 : 647 - 657
  • [47] First-in-human real-time AI-assisted instrument deocclusion during augmented reality robotic surgery
    Hofman, Jasper
    De Backer, Pieter
    Manghi, Ilaria
    Simoens, Jente
    De Groote, Ruben
    Van Den Bossche, Hannes
    D'Hondt, Mathieu
    Oosterlinck, Tim
    Lippens, Julie
    Van Praet, Charles
    Ferraguti, Federica
    Debbaut, Charlotte
    Li, Zhijin
    Kutter, Oliver
    Mottrie, Alex
    Decaestecker, Karel
    HEALTHCARE TECHNOLOGY LETTERS, 2024, 11 (2-3) : 33 - 39
  • [48] Efficient Outdoor Video Semantic Segmentation Using Feedback-Based Fully Convolution Neural Network
    Wong, Chi-Chong
    Gan, Yanfen
    Vong, Chi-Man
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (08) : 5128 - 5136
  • [49] AOSVSSNet: Attention-Guided Optical Satellite Video Smoke Segmentation Network
    Wang, Taoyang
    Hong, Jianzhi
    Han, Yuqi
    Zhang, Guo
    Chen, Shili
    Dong, Tiancheng
    Yang, Yapeng
    Ruan, Hang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 8552 - 8566
  • [50] Wavelet Based Video Segmentation using Self Organizing Map Neural Network
    Ishtiaq, Muhammad
    Jaffar, Arfan
    Hussain, Ayaz
    Basit, Abdul
    Mirza, Anwar M.
    IACSIT-SC 2009: INTERNATIONAL ASSOCIATION OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY - SPRING CONFERENCE, 2009, : 122 - 125