Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

被引：1

作者：

Wang, Hongqiu ^{[1
]}

Yang, Guang ^{[2
]}

Zhang, Shichen ^{[1
]}

Qin, Jing ^{[3
]}

Guo, Yike ^{[4
,5
,6
]}

Xu, Bo ^{[7
]}

Jin, Yueming ^{[8
,9
]}

Zhu, Lei ^{[1
,6
,10
]}

机构：

[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst ROAS Thrust, Syst Hub, Guangzhou 511400, Peoples R China

[2] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England

[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Hong Kong, Peoples R China

[4] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, London SW7 2AZ, England

[5] Imperial Coll London, Dept Comp Sci, London SW7 2AZ, England

[6] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[7] Gen Hosp Southern Theatre Command PLA, Dept Anesthesiol, Guangzhou 510010, Peoples R China

[8] Natl Univ Singapore, Dept Biomed Engn, Singapore 119077, Singapore

[9] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore

[10] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2024年 / 43卷 / 12期

关键词：

Instruments; Surgery; Image segmentation; Robots; Task analysis; Visualization; Accuracy; Robotic-assisted surgery; instrument segmentation; referring video object segmentation; video-language learning;

D O I：

10.1109/TMI.2024.3426953

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (https://github.com/whq-xxh/RSVIS).

引用

页码：4457 / 4469

页数：13

共 50 条

[31] Reducing Latency in a Converted Spiking Video Segmentation Network
Cheni, Qinyu
Rueckauer, Bodo
Li, Li
Delbruck, Tobi
Liu, Shih-Chii
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[32] Anchor-Free Feature Aggregation Network for Instrument Detection in Endoscopic Surgery
Ding, Guanzhi
Zhao, Xiushun
Peng, Cai
Li, Li
Guo, Jing
Li, Depei
Jiang, Xiaobing
IEEE ACCESS, 2023, 11 : 29464 - 29473
[33] Weakly Supervised Referring Video Object Segmentation With Object-Centric Pseudo-Guidance
Wang, Weikang
Su, Yuting
Liu, Jing
Sun, Wei
Zhai, Guangtao
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1320 - 1333
[34] Design, Optimization, and Experimental Validation of a Handheld Nonconstant-Curvature Hybrid-Structure Robotic Instrument for Maxillary Sinus Surgery
Wang, Xuchen
Ma, Xin
Zhu, Puchen
Ng, Wee Shen
Zhang, Huayu
Xia, Xianfeng
Taylor, Russell H.
Au, Kwok Wai Samuel
IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2024, 29 (04) : 3074 - 3082
[35] Automatic instrument segmentation in robot-assisted surgery using deep learning
Shvets, Alexey A.
Rakhlin, Alexander
Kalinin, Alexandr A.
Iglovikov, Vladimir I.
2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 624 - 628
[36] Video-Based Method of Quantifying Performance and Instrument Motion During Simulated Phonosurgery
Conroy, Ellen
Surender, Ketan
Geng, Zhixian
Chen, Ting
Dailey, Seth
Jiang, Jack
LARYNGOSCOPE, 2014, 124 (10): : 2332 - 2337
[37] MSDE-Net: A Multi-Scale Dual-Encoding Network for Surgical Instrument Segmentation
Yang, Lei
Gu, Yuge
Bian, Guibin
Liu, Yanhong
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (07) : 4072 - 4083
[38] Advantages and disadvantages of robotic and uniportal video-assisted thoracoscopic surgery
Nagano, Hiromitsu
Suda, Takashi
VIDEO-ASSISTED THORACIC SURGERY, 2021, 6
[39] Video-based augmented reality combining CT-scan and instrument position data to microscope view in middle ear surgery
Hussain, Raabid
Lalande, Alain
Marroquin, Roberto
Guigou, Caroline
Grayeli, Alexis Bozorg
SCIENTIFIC REPORTS, 2020, 10 (01)
[40] Deep learning-based video-analysis of instrument motion in microvascular anastomosis training
Sugiyama, Taku
Sugimori, Hiroyuki
Tang, Minghui
Ito, Yasuhiro
Gekka, Masayuki
Uchino, Haruto
Ito, Masaki
Ogasawara, Katsuhiko
Fujimura, Miki
ACTA NEUROCHIRURGICA, 2024, 166 (01)

← 1 2 3 4 5 →