Exploring Segment-Level Semantics for Online Phase Recognition From Surgical Videos

被引：29

作者：

Ding, Xinpeng ^{[1
]}

Li, Xiaomeng ^{[1
,2
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

[2] Hong Kong Univ Sci & Technol, Shenzhen Res Inst, Shenzhen 518057, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2022年 / 41卷 / 11期

关键词：

Surgery; Videos; Feature extraction; Semantics; Hidden Markov models; Task analysis; Convolution; Surgical video analysis; surgical phase recognition; REAL-TIME SEGMENTATION; WORKFLOW RECOGNITION; TASKS;

D O I：

10.1109/TMI.2022.3182995

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Automatic surgical phase recognition plays a vital role in robot-assisted surgeries. Existing methods ignored a pivotal problem that surgical phases should be classified by learning segment-level semantics instead of solely relying on frame-wise information. This paper presents a segment-attentive hierarchical consistency network (SAHC) for surgical phase recognition from videos. The key idea is to extract hierarchical high-level semantic-consistent segments and use them to refine the erroneous predictions caused by ambiguous frames. To achieve it, we design a temporal hierarchical network to generate hierarchical high-level segments. Then, we introduce a hierarchical segment-frame attention module to capture relations between the low-level frames and high-level segments. By regularizing the predictions of frames and their corresponding segments via a consistency loss, the network can generate semantic-consistent segments and then rectify the misclassified predictions caused by ambiguous low-level frames. We validate SAHC on two public surgical video datasets, i.e., the M2CAI16 challenge dataset and the Cholec80 dataset. Experimental results show that our method outperforms previous state-of-the-arts and ablation studies prove the effectiveness of our proposed modules. Our code has been released at: https://github.com/xmed-lab/SAHC.

引用

页码：3309 / 3319

页数：11

共 57 条

[1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation [J].

Abu Farha, Yazan ;

Gall, Juergen .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3570-3579

[2]

Arnab A., 2021, arXiv

[3]

Ba J. L., 2016, arXiv, DOI 10.48550/arXiv:1607.06450

[4]

Blum T, 2010, LECT NOTES COMPUT SC, V6363, P400

[5] Pre-Trained Image Processing Transformer [J].

Chen, Hanting ;

Wang, Yunhe ;

Guo, Tianyu ;

Xu, Chang ;

Deng, Yiping ;

Liu, Zhenhua ;

Ma, Siwei ;

Xu, Chunjing ;

Xu, Chao ;

Gao, Wen .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305

[6]

Czempiel Tobias, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12263), P343, DOI 10.1007/978-3-030-59716-0_33

[7] Automatic data-driven real-time segmentation and recognition of surgical workflow [J].

Dergachyova, Olga ;

Bouget, David ;

Huaulme, Arnaud ;

Morandi, Xavier ;

Jannin, Pierre .

INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2016, 11 (06) :1081-1089

[8]

Ding XP, 2020, Arxiv, DOI arXiv:2007.01598

[9] Support-Set Based Cross-Supervision for Video Grounding [J].

Ding, Xinpeng ;

Wang, Nannan ;

Zhang, Shiwei ;

Cheng, De ;

Li, Xiaomeng ;

Huang, Ziyuan ;

Tang, Mingqian ;

Gao, Xinbo .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11553-11562

[10]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]

← 1 2 3 4 5 6 →