Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

被引：1

作者：

Wang, Hongqiu ^{[1
]}

Yang, Guang ^{[2
]}

Zhang, Shichen ^{[1
]}

Qin, Jing ^{[3
]}

Guo, Yike ^{[4
,5
,6
]}

Xu, Bo ^{[7
]}

Jin, Yueming ^{[8
,9
]}

Zhu, Lei ^{[1
,6
,10
]}

机构：

[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst ROAS Thrust, Syst Hub, Guangzhou 511400, Peoples R China

[2] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England

[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Hong Kong, Peoples R China

[4] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, London SW7 2AZ, England

[5] Imperial Coll London, Dept Comp Sci, London SW7 2AZ, England

[6] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[7] Gen Hosp Southern Theatre Command PLA, Dept Anesthesiol, Guangzhou 510010, Peoples R China

[8] Natl Univ Singapore, Dept Biomed Engn, Singapore 119077, Singapore

[9] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore

[10] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2024年 / 43卷 / 12期

关键词：

Instruments; Surgery; Image segmentation; Robots; Task analysis; Visualization; Accuracy; Robotic-assisted surgery; instrument segmentation; referring video object segmentation; video-language learning;

D O I：

10.1109/TMI.2024.3426953

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (https://github.com/whq-xxh/RSVIS).

引用

页码：4457 / 4469

页数：13

共 50 条

[41] Adaptive convolutional neural network for large change in video object segmentation
Yin, Hui
Yang, Lin
Xu, Hongli
Wan, Jin
IET COMPUTER VISION, 2019, 13 (05) : 452 - 460
[42] Multi-Granularity Context Network for Efficient Video Semantic Segmentation
Liang, Zhiyuan
Dai, Xiangdong
Wu, Yiqian
Jin, Xiaogang
Shen, Jianbing
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3163 - 3175
[43] Actor and Action Modular Network for Text-Based Video Segmentation
Yang, Jianhua
Huang, Yan
Niu, Kai
Huang, Linjiang
Ma, Zhanyu
Wang, Liang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4474 - 4489
[44] MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation
Zhou, Tianfei
Li, Jianwu
Wang, Shunzhou
Tao, Ran
Shen, Jianbing
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8326 - 8338
[45] FBG-Based Sensorized Surgical Instrument for Force Measurement in Minimally Invasive Robotic Surgery
Liu, Qiao
Dai, Yu
Li, Mengwen
Yao, Bin
Zhang, Jianxun
IEEE SENSORS JOURNAL, 2024, 24 (07) : 11450 - 11458
[46] Evaluation of Methods for Detection and Semantic Segmentation of the Anterior Capsulotomy in Cataract Surgery Video
Zeng, Zixue
Giap, Binh Duong
Kahana, Ethan
Lustre, Jefferson
Mahmoud, Ossama
Mian, Shahzad, I
Tannen, Bradford
Nallasamy, Nambi
CLINICAL OPHTHALMOLOGY, 2024, 18 : 647 - 657
[47] First-in-human real-time AI-assisted instrument deocclusion during augmented reality robotic surgery
Hofman, Jasper
De Backer, Pieter
Manghi, Ilaria
Simoens, Jente
De Groote, Ruben
Van Den Bossche, Hannes
D'Hondt, Mathieu
Oosterlinck, Tim
Lippens, Julie
Van Praet, Charles
Ferraguti, Federica
Debbaut, Charlotte
Li, Zhijin
Kutter, Oliver
Mottrie, Alex
Decaestecker, Karel
HEALTHCARE TECHNOLOGY LETTERS, 2024, 11 (2-3) : 33 - 39
[48] Efficient Outdoor Video Semantic Segmentation Using Feedback-Based Fully Convolution Neural Network
Wong, Chi-Chong
Gan, Yanfen
Vong, Chi-Man
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (08) : 5128 - 5136
[49] AOSVSSNet: Attention-Guided Optical Satellite Video Smoke Segmentation Network
Wang, Taoyang
Hong, Jianzhi
Han, Yuqi
Zhang, Guo
Chen, Shili
Dong, Tiancheng
Yang, Yapeng
Ruan, Hang
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 8552 - 8566
[50] Wavelet Based Video Segmentation using Self Organizing Map Neural Network
Ishtiaq, Muhammad
Jaffar, Arfan
Hussain, Ayaz
Basit, Abdul
Mirza, Anwar M.
IACSIT-SC 2009: INTERNATIONAL ASSOCIATION OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY - SPRING CONFERENCE, 2009, : 122 - 125

← 1 2 3 4 5 →