Exploring Segment-Level Semantics for Online Phase Recognition From Surgical Videos

被引：29

作者：

Ding, Xinpeng ^{[1
]}

Li, Xiaomeng ^{[1
,2
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

[2] Hong Kong Univ Sci & Technol, Shenzhen Res Inst, Shenzhen 518057, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2022年 / 41卷 / 11期

关键词：

Surgery; Videos; Feature extraction; Semantics; Hidden Markov models; Task analysis; Convolution; Surgical video analysis; surgical phase recognition; REAL-TIME SEGMENTATION; WORKFLOW RECOGNITION; TASKS;

D O I：

10.1109/TMI.2022.3182995

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Automatic surgical phase recognition plays a vital role in robot-assisted surgeries. Existing methods ignored a pivotal problem that surgical phases should be classified by learning segment-level semantics instead of solely relying on frame-wise information. This paper presents a segment-attentive hierarchical consistency network (SAHC) for surgical phase recognition from videos. The key idea is to extract hierarchical high-level semantic-consistent segments and use them to refine the erroneous predictions caused by ambiguous frames. To achieve it, we design a temporal hierarchical network to generate hierarchical high-level segments. Then, we introduce a hierarchical segment-frame attention module to capture relations between the low-level frames and high-level segments. By regularizing the predictions of frames and their corresponding segments via a consistency loss, the network can generate semantic-consistent segments and then rectify the misclassified predictions caused by ambiguous low-level frames. We validate SAHC on two public surgical video datasets, i.e., the M2CAI16 challenge dataset and the Cholec80 dataset. Experimental results show that our method outperforms previous state-of-the-arts and ablation studies prove the effectiveness of our proposed modules. Our code has been released at: https://github.com/xmed-lab/SAHC.

引用

页码：3309 / 3319

页数：11

共 57 条

[11] SlowFast Networks for Video Recognition [J].

Feichtenhofer, Christoph ;

Fan, Haoqi ;

Malik, Jitendra ;

He, Kaiming .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210

[12] TALL: Temporal Activity Localization via Language Query [J].

Gao, Jiyang ;

Sun, Chen ;

Yang, Zhenheng ;

Nevatia, Ram .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5277-5285

[13]

Gao X., 2021, arXiv, DOI DOI 10.48550/ARXIV.2103.09712

[14] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[15] Localizing Moments in Video with Natural Language [J].

Hendricks, Lisa Anne ;

Wang, Oliver ;

Shechtman, Eli ;

Sivic, Josef ;

Darrell, Trevor ;

Russell, Bryan .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5804-5813

[16] Temporal Memory Relation Network for Workflow Recognition From Surgical Video [J].

Jin, Yueming ;

Long, Yonghao ;

Chen, Cheng ;

Zhao, Zixu ;

Dou, Qi ;

Heng, Pheng-Ann .

IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (07) :1911-1923

[17] Multi-task recurrent convolutional network with correlation loss for surgical video analysis [J].

Jin, Yueming ;

Li, Huaxia ;

Dou, Qi ;

Chen, Hao ;

Qin, Jing ;

Fu, Chi-Wing ;

Heng, Pheng-Ann .

MEDICAL IMAGE ANALYSIS, 2020, 59

[18] SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network [J].

Jin, Yueming ;

Dou, Qi ;

Chen, Hao ;

Yu, Lequan ;

Qin, Jing ;

Fu, Chi-Wing ;

Heng, Pheng-Ann .

IEEE TRANSACTIONS ON MEDICAL IMAGING, 2018, 37 (05) :1114-1126

[19]

Kingma DP, 2014, ADV NEUR IN, V27

[20] A Framework for the Recognition of High-Level Surgical Tasks From Video Images for Cataract Surgeries [J].

Lalys, F. ;

Riffaud, L. ;

Bouget, D. ;

Jannin, P. .

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2012, 59 (04) :966-976

← 1 2 3 4 5 6 →