Robotic Scene Segmentation with Memory Network for Runtime Surgical Context Inference

被引:0
|
作者
Li, Zongyu [1 ]
Reyes, Ian [2 ,3 ]
Alemzadeh, Homa [1 ]
机构
[1] Univ Virginia, Dept Elect & Comp Engn, Charlottesville, VA 22903 USA
[2] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA
[3] IBM Corp, New York, NY USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/IROS55552.2023.10342013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Surgical context inference has recently garnered significant attention in robot-assisted surgery as it can facilitate workflow analysis, skill assessment, and error detection. However, runtime context inference is challenging since it requires timely and accurate detection of the interactions among the tools and objects in the surgical scene based on the segmentation of video data. On the other hand, existing state-of-the-art video segmentation methods are often biased against infrequent classes and fail to provide temporal consistency for segmented masks. This can negatively impact the context inference and accurate detection of critical states. In this study, we propose a solution to these challenges using a Space-Time Correspondence Network (STCN). STCN is a memory network that performs binary segmentation and minimizes the effects of class imbalance. The use of a memory bank in STCN allows for the utilization of past image and segmentation information, thereby ensuring consistency of the masks. Our experiments using the publicly-available JIGSAWS dataset demonstrate that STCN achieves superior segmentation performance for objects that are difficult to segment, such as needle and thread, and improves context inference compared to the state-of-the-art. We also demonstrate that segmentation and context inference can be performed at runtime without compromising performance.
引用
收藏
页码:6601 / 6607
页数:7
相关论文
共 50 条
  • [21] Multimodal graph inference network for scene graph generation
    Duan, Jingwen
    Min, Weidong
    Lin, Deyu
    Xu, Jianfeng
    Xiong, Xin
    APPLIED INTELLIGENCE, 2021, 51 (12) : 8768 - 8783
  • [22] Towards Surgical Context Inference and Translation to Gestures
    Hutchinso, Kay
    Li, Zongyu
    Reyes, Ian
    Alemzadeh, Homa
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 6802 - 6809
  • [23] Attention-Guided Lightweight Network for Real-Time Segmentation of Robotic Surgical Instruments
    Ni, Zhen-Liang
    Bian, Gui-Bin
    Hou, Zeng-Guang
    Zhou, Xiao-Hu
    Xie, Xiao-Liang
    Li, Zhen
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 9939 - 9945
  • [24] Efficient global-local memory for real-time instrument segmentation of robotic surgical video
    Wang, Jiacheng
    Jin, Yueming
    Wang, Liansheng
    Cai, Shuntian
    Heng, Pheng-Ann
    Qin, Jing
    arXiv, 2021,
  • [25] Efficient Global-Local Memory for Real-Time Instrument Segmentation of Robotic Surgical Video
    Wang, Jiacheng
    Jin, Yueming
    Wang, Liansheng
    Cai, Shuntian
    Heng, Pheng-Ann
    Qin, Jing
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT IV, 2021, 12904 : 341 - 351
  • [26] SCENE CONTEXT ENHANCED NETWORK FOR PERSON SEARCH
    Ma, Mengyuan
    Yin, Hujun
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2541 - 2545
  • [27] Structured prediction for urban scene semantic segmentation with geographic context
    Volpi, Michele
    Ferrari, Vittorio
    2015 JOINT URBAN REMOTE SENSING EVENT (JURSE), 2015,
  • [28] Smothered by the scene: When context interferes with memory for objects
    Evans, K.
    Wolfe, J.
    PERCEPTION, 2012, 41 : 163 - 163
  • [29] Fuzzy context-specific intention inference for robotic caregiving
    Liu, Rui
    Zhang, Xiaoli
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2016, 13 : 1 - 14
  • [30] Bag of spatio-visual words for context inference in scene classification
    Bolovinou, A.
    Pratikakis, I.
    Perantonis, S.
    PATTERN RECOGNITION, 2013, 46 (03) : 1039 - 1053