Temporal Memory Relation Network for Workflow Recognition From Surgical Video

被引:71
作者
Jin, Yueming [1 ]
Long, Yonghao [1 ]
Chen, Cheng [1 ]
Zhao, Zixu [1 ]
Dou, Qi [1 ,2 ]
Heng, Pheng-Ann [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, CUHK T Stone Robot Inst, Hong Kong, Peoples R China
[3] Chinese Acad Sci, Guangdong Hong Kong Macao Joint Lab Human Machine, Shenzhen Inst Adv Technol, Beijing 100864, Peoples R China
基金
中国国家自然科学基金;
关键词
Surgery; Hidden Markov models; Feature extraction; Visualization; Training; Gallbladder; Tools; Surgical workflow recognition; long-range memory clue; multi-scale temporal convolution; non-local operation; REAL-TIME SEGMENTATION; TASKS;
D O I
10.1109/TMI.2021.3069471
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automatic surgical workflow recognition is a key component for developing context-aware computer-assisted systems in the operating theatre. Previous works either jointly modeled the spatial features with short fixed-range temporal information, or separately learned visual and long temporal cues. In this paper, we propose a novel end-to-end temporal memory relation network (TMRNet) for relating long-range and multi-scale temporal patterns to augment the present features. We establish a long-range memory bank to serve as a memory cell storing the rich supportive information. Through our designed temporal variation layer, the supportive cues are further enhanced by multi-scale temporal-only convolutions. To effectively incorporate the two types of cues without disturbing the joint learning of spatio-temporal features, we introduce a non-local bank operator to attentively relate the past to the present. In this regard, our TMRNet enables the current feature to view the long-range temporal dependency, as well as tolerate complex temporal extents. We have extensively validated our approach on two benchmark surgical video datasets, M2CAI challenge dataset and Cholec80 dataset. Experimental results demonstrate the outstanding performance of our method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 67.0% v.s. 78.9% Jaccard on Cholec80 dataset).
引用
收藏
页码:1911 / 1923
页数:13
相关论文
共 43 条
[1]  
Ba J. L., 2016, J. Mach. Learn. Res.
[2]  
Baolian Qi, 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Proceedings, P1358, DOI 10.1109/BIBM47256.2019.8983269
[3]  
Blum T, 2010, LECT NOTES COMPUT SC, V6363, P400
[4]  
Cadene R., 2016, ARXIV1610
[5]  
de Mathelin M., 2016, ARXIV161008844
[6]   Automatic data-driven real-time segmentation and recognition of surgical workflow [J].
Dergachyova, Olga ;
Bouget, David ;
Huaulme, Arnaud ;
Morandi, Xavier ;
Jannin, Pierre .
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2016, 11 (06) :1081-1089
[7]   Automatic phase prediction from low-level surgical activities [J].
Forestier, Germain ;
Riffaud, Laurent ;
Jannin, Pierre .
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2015, 10 (06) :833-841
[8]   Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video [J].
Funke, Isabel ;
Bodenstedt, Sebastian ;
Oehme, Florian ;
von Bechtolsheim, Felix ;
Weitz, Juergen ;
Speidel, Stefanie .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT V, 2019, 11768 :467-475
[9]  
Hanchao Yu, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12266), P436, DOI 10.1007/978-3-030-59725-2_42
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778