Exploring Segment-Level Semantics for Online Phase Recognition From Surgical Videos

被引:29
作者
Ding, Xinpeng [1 ]
Li, Xiaomeng [1 ,2 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Shenzhen Res Inst, Shenzhen 518057, Peoples R China
关键词
Surgery; Videos; Feature extraction; Semantics; Hidden Markov models; Task analysis; Convolution; Surgical video analysis; surgical phase recognition; REAL-TIME SEGMENTATION; WORKFLOW RECOGNITION; TASKS;
D O I
10.1109/TMI.2022.3182995
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automatic surgical phase recognition plays a vital role in robot-assisted surgeries. Existing methods ignored a pivotal problem that surgical phases should be classified by learning segment-level semantics instead of solely relying on frame-wise information. This paper presents a segment-attentive hierarchical consistency network (SAHC) for surgical phase recognition from videos. The key idea is to extract hierarchical high-level semantic-consistent segments and use them to refine the erroneous predictions caused by ambiguous frames. To achieve it, we design a temporal hierarchical network to generate hierarchical high-level segments. Then, we introduce a hierarchical segment-frame attention module to capture relations between the low-level frames and high-level segments. By regularizing the predictions of frames and their corresponding segments via a consistency loss, the network can generate semantic-consistent segments and then rectify the misclassified predictions caused by ambiguous low-level frames. We validate SAHC on two public surgical video datasets, i.e., the M2CAI16 challenge dataset and the Cholec80 dataset. Experimental results show that our method outperforms previous state-of-the-arts and ablation studies prove the effectiveness of our proposed modules. Our code has been released at: https://github.com/xmed-lab/SAHC.
引用
收藏
页码:3309 / 3319
页数:11
相关论文
共 57 条
[11]   SlowFast Networks for Video Recognition [J].
Feichtenhofer, Christoph ;
Fan, Haoqi ;
Malik, Jitendra ;
He, Kaiming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210
[12]   TALL: Temporal Activity Localization via Language Query [J].
Gao, Jiyang ;
Sun, Chen ;
Yang, Zhenheng ;
Nevatia, Ram .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5277-5285
[13]  
Gao X., 2021, arXiv, DOI DOI 10.48550/ARXIV.2103.09712
[14]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[15]   Localizing Moments in Video with Natural Language [J].
Hendricks, Lisa Anne ;
Wang, Oliver ;
Shechtman, Eli ;
Sivic, Josef ;
Darrell, Trevor ;
Russell, Bryan .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5804-5813
[16]   Temporal Memory Relation Network for Workflow Recognition From Surgical Video [J].
Jin, Yueming ;
Long, Yonghao ;
Chen, Cheng ;
Zhao, Zixu ;
Dou, Qi ;
Heng, Pheng-Ann .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (07) :1911-1923
[17]   Multi-task recurrent convolutional network with correlation loss for surgical video analysis [J].
Jin, Yueming ;
Li, Huaxia ;
Dou, Qi ;
Chen, Hao ;
Qin, Jing ;
Fu, Chi-Wing ;
Heng, Pheng-Ann .
MEDICAL IMAGE ANALYSIS, 2020, 59
[18]   SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network [J].
Jin, Yueming ;
Dou, Qi ;
Chen, Hao ;
Yu, Lequan ;
Qin, Jing ;
Fu, Chi-Wing ;
Heng, Pheng-Ann .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2018, 37 (05) :1114-1126
[19]  
Kingma DP, 2014, ADV NEUR IN, V27
[20]   A Framework for the Recognition of High-Level Surgical Tasks From Video Images for Cataract Surgeries [J].
Lalys, F. ;
Riffaud, L. ;
Bouget, D. ;
Jannin, P. .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2012, 59 (04) :966-976