Exploring Segment-Level Semantics for Online Phase Recognition From Surgical Videos

被引:22
作者
Ding, Xinpeng [1 ]
Li, Xiaomeng [1 ,2 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Shenzhen Res Inst, Shenzhen 518057, Peoples R China
关键词
Surgery; Videos; Feature extraction; Semantics; Hidden Markov models; Task analysis; Convolution; Surgical video analysis; surgical phase recognition; REAL-TIME SEGMENTATION; WORKFLOW RECOGNITION; TASKS;
D O I
10.1109/TMI.2022.3182995
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automatic surgical phase recognition plays a vital role in robot-assisted surgeries. Existing methods ignored a pivotal problem that surgical phases should be classified by learning segment-level semantics instead of solely relying on frame-wise information. This paper presents a segment-attentive hierarchical consistency network (SAHC) for surgical phase recognition from videos. The key idea is to extract hierarchical high-level semantic-consistent segments and use them to refine the erroneous predictions caused by ambiguous frames. To achieve it, we design a temporal hierarchical network to generate hierarchical high-level segments. Then, we introduce a hierarchical segment-frame attention module to capture relations between the low-level frames and high-level segments. By regularizing the predictions of frames and their corresponding segments via a consistency loss, the network can generate semantic-consistent segments and then rectify the misclassified predictions caused by ambiguous low-level frames. We validate SAHC on two public surgical video datasets, i.e., the M2CAI16 challenge dataset and the Cholec80 dataset. Experimental results show that our method outperforms previous state-of-the-arts and ablation studies prove the effectiveness of our proposed modules. Our code has been released at: https://github.com/xmed-lab/SAHC.
引用
收藏
页码:3309 / 3319
页数:11
相关论文
共 57 条
  • [1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
    Abu Farha, Yazan
    Gall, Juergen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
  • [2] Arnab A., 2021, arXiv
  • [3] Ba JL., 2016, ARXIV
  • [4] Blum T, 2010, LECT NOTES COMPUT SC, V6363, P400
  • [5] Pre-Trained Image Processing Transformer
    Chen, Hanting
    Wang, Yunhe
    Guo, Tianyu
    Xu, Chang
    Deng, Yiping
    Liu, Zhenhua
    Ma, Siwei
    Xu, Chunjing
    Xu, Chao
    Gao, Wen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
  • [6] Czempiel Tobias, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12263), P343, DOI 10.1007/978-3-030-59716-0_33
  • [7] Automatic data-driven real-time segmentation and recognition of surgical workflow
    Dergachyova, Olga
    Bouget, David
    Huaulme, Arnaud
    Morandi, Xavier
    Jannin, Pierre
    [J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2016, 11 (06) : 1081 - 1089
  • [8] Ding XP, 2020, Arxiv, DOI arXiv:2007.01598
  • [9] Support-Set Based Cross-Supervision for Video Grounding
    Ding, Xinpeng
    Wang, Nannan
    Zhang, Shiwei
    Cheng, De
    Li, Xiaomeng
    Huang, Ziyuan
    Tang, Mingqian
    Gao, Xinbo
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11553 - 11562
  • [10] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]