Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

被引：1

作者：

Wang, Hongqiu ^{[1
]}

Yang, Guang ^{[2
]}

Zhang, Shichen ^{[1
]}

Qin, Jing ^{[3
]}

Guo, Yike ^{[4
,5
,6
]}

Xu, Bo ^{[7
]}

Jin, Yueming ^{[8
,9
]}

Zhu, Lei ^{[1
,6
,10
]}

机构：

[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst ROAS Thrust, Syst Hub, Guangzhou 511400, Peoples R China

[2] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England

[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Hong Kong, Peoples R China

[4] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, London SW7 2AZ, England

[5] Imperial Coll London, Dept Comp Sci, London SW7 2AZ, England

[6] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[7] Gen Hosp Southern Theatre Command PLA, Dept Anesthesiol, Guangzhou 510010, Peoples R China

[8] Natl Univ Singapore, Dept Biomed Engn, Singapore 119077, Singapore

[9] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore

[10] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2024年 / 43卷 / 12期

关键词：

Instruments; Surgery; Image segmentation; Robots; Task analysis; Visualization; Accuracy; Robotic-assisted surgery; instrument segmentation; referring video object segmentation; video-language learning;

D O I：

10.1109/TMI.2024.3426953

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (https://github.com/whq-xxh/RSVIS).

引用

页码：4457 / 4469

页数：13

共 50 条

[1] Branch Aggregation Attention Network for Robotic Surgical Instrument Segmentation
Shen, Wenting
Wang, Yaonan
Liu, Min
Wang, Jiazheng
Ding, Renjie
Zhang, Zhe
Meijering, Erik
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (11) : 3408 - 3419
[2] Object-Agnostic Transformers for Video Referring Segmentation
Yang, Xu
Wang, Hao
Xie, De
Deng, Cheng
Tao, Dacheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2839 - 2849
[3] Automatic Sinus Surgery Skill Assessment Based on Instrument Segmentation and Tracking in Endoscopic Video
Lin, Shan
Qin, Fangbo
Bly, Randall A.
Moe, Kris S.
Hannaford, Blake
MULTISCALE MULTIMODAL MEDICAL IMAGING, MMMI 2019, 2020, 11977 : 93 - 100
[4] Pixel-Wise Contrastive Learning for Multi-Class Instrument Segmentation in Endoscopic Robotic Surgery Videos Using Dataset-Wide Sample Queues
Sun, Liping
Chen, Xiong
IEEE ACCESS, 2024, 12 : 156867 - 156877
[5] Robotic Instrument Segmentation With Image-to-Image Translation
Colleoni, Emanuele
Stoyanov, Danail
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 935 - 942
[6] SAMSurg: Surgical Instrument Segmentation in Robotic Surgeries Using Vision Foundation Model
Matasyoh, Nevin M.
Mathis-Ullrich, Franziska
Zeineldin, Ramy A.
IEEE ACCESS, 2024, 12 : 193950 - 193959
[7] Surgical-DeSAM: decoupling SAM for instrument segmentation in robotic surgery
Sheng, Yuyang
Bano, Sophia
Clarkson, Matthew J.
Islam, Mobarakol
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2024, 19 (07) : 1267 - 1271
[8] HCTA-Net: A Hybrid CNN-Transformer Attention Network for Surgical Instrument Segmentation
Yang, Lei
Wang, Hongyong
Bian, Guibin
Liu, Yanhong
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2023, 5 (04): : 929 - 944
[9] A Dual-Branch Fusion Network for Surgical Instrument Segmentation
Yang, Lei
Zhai, Chenxu
Wang, Hongyong
Liu, Yanhong
Bian, Guibin
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (04): : 1542 - 1554
[10] Instrument-Tissue Interaction Detection Framework for Surgical Video Understanding
Lin, Wenjun
Hu, Yan
Fu, Huazhu
Yang, Mingming
Chng, Chin-Boon
Kawasaki, Ryo
Chui, Cheekong
Liu, Jiang
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (08) : 2803 - 2813

← 1 2 3 4 5 →