Adaptive undersampling and short clip-based two-stream CNN-LSTM model for surgical phase recognition on cholecystectomy videos

被引:6
作者
Lee, Sang-Goo [1 ,3 ]
Kim, Ga-Young [2 ]
Hwang, Yoo-Na [1 ,3 ]
Kwon, Ji-Yean [1 ,3 ]
Kim, Sung-Min [1 ,3 ]
机构
[1] Dongguk Univ Seoul, Dept Med Device & Healthcare, 30 Pildong Ro 1 Gil, Seoul 04620, South Korea
[2] Johns Hopkins Univ, Sch Med, Radiat Oncol & Mol Radiat Sci, Baltimore, MD 21218 USA
[3] Dongguk Univ Seoul, Dept Regulatory Sci Med Device, 30 Pildong Ro 1 Gil, Seoul 04620, South Korea
关键词
Automated surgical phase recognition; Cholecystectomy; Endoscopic video; Short-clip-based; Two-stream CNN-LSTMs; Undersampling; WORKFLOW;
D O I
10.1016/j.bspc.2023.105637
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Surgical phase recognition is challenging due to overfitting problems caused by imbalanced data among surgical phases. We proposed an adaptive sampling rate-based undersampling method that could generate the number of each surgical phase data similarly to alleviate biased learning. To improve the performance of our method, we also introduced a two-stream CNN-LSTM model that could extract temporal information on behavioral changes between each image frame. First, we extracted a total of 40,236 short clips using an adaptive subsampling rate from the entire video. Each short clip was entered into a pre-trained GoogLeNet. The output with visual information was then immediately fed into a sequence-to-sequence LSTM model to extract temporal information of neighbor frames within a short clip. At the same time, another sequence-to-vector LSTM was used, to extract temporal information from all successive image frames to predict the final surgical phase. The proposed method was evaluated with a public dataset Cholec80. The proposed approach outperformed state-of-the-art methods, showing a high F1-score of 87.12% and an AUC of 98.00%. In addition, the F1-score deviation between all phases decreased by about 10% compared to that before applying undersampling. Experimental results confirmed that employing our proposed method could learn enrich temporal information from short clips. It outperformed the conventional one-stream CNN-LSTM architecture.
引用
收藏
页数:9
相关论文
共 32 条
[1]  
[Anonymous], 1994, P 3 INT C KNOWLEDGE, DOI DOI 10.5555/3000850.3000887
[2]   Impact of data on generalization of AI for surgical intelligence applications [J].
Bar, Omri ;
Neimark, Daniel ;
Zohar, Maya ;
Hager, Gregory D. ;
Girshick, Ross ;
Fried, Gerald M. ;
Wolf, Tamir ;
Asselmann, Dotan .
SCIENTIFIC REPORTS, 2020, 10 (01)
[3]  
Bardram JE, 2011, INT CONF PERVAS COMP, P45, DOI 10.1109/PERCOM.2011.5767594
[4]  
Blum T, 2010, LECT NOTES COMPUT SC, V6363, P400
[5]   A systematic study of the class imbalance problem in convolutional neural networks [J].
Buda, Mateusz ;
Maki, Atsuto ;
Mazurowski, Maciej A. .
NEURAL NETWORKS, 2018, 106 :249-259
[6]   Surgeon Volume Metrics in Laparoscopic Cholecystectomy [J].
Csikesz, Nicholas G. ;
Singla, Anand ;
Murphy, Melissa M. ;
Tseng, Jennifer F. ;
Shah, Shimul A. .
DIGESTIVE DISEASES AND SCIENCES, 2010, 55 (08) :2398-2405
[7]  
Czempiel Tobias, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12263), P343, DOI 10.1007/978-3-030-59716-0_33
[8]  
Czempiel T, 2021, Arxiv, DOI arXiv:2103.03873
[9]   Hidden Markov models [J].
Eddy, SR .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :361-365
[10]   The hierarchical hidden Markov model: Analysis and applications [J].
Fine, S ;
Singer, Y ;
Tishby, N .
MACHINE LEARNING, 1998, 32 (01) :41-62