Adaptive undersampling and short clip-based two-stream CNN-LSTM model for surgical phase recognition on cholecystectomy videos

被引：6

作者：

Lee, Sang-Goo ^{[1
,3
]}

Kim, Ga-Young ^{[2
]}

Hwang, Yoo-Na ^{[1
,3
]}

Kwon, Ji-Yean ^{[1
,3
]}

Kim, Sung-Min ^{[1
,3
]}

机构：

[1] Dongguk Univ Seoul, Dept Med Device & Healthcare, 30 Pildong Ro 1 Gil, Seoul 04620, South Korea

[2] Johns Hopkins Univ, Sch Med, Radiat Oncol & Mol Radiat Sci, Baltimore, MD 21218 USA

[3] Dongguk Univ Seoul, Dept Regulatory Sci Med Device, 30 Pildong Ro 1 Gil, Seoul 04620, South Korea

来源：

BIOMEDICAL SIGNAL PROCESSING AND CONTROL | 2024年 / 88卷

关键词：

Automated surgical phase recognition; Cholecystectomy; Endoscopic video; Short-clip-based; Two-stream CNN-LSTMs; Undersampling; WORKFLOW;

D O I：

10.1016/j.bspc.2023.105637

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Surgical phase recognition is challenging due to overfitting problems caused by imbalanced data among surgical phases. We proposed an adaptive sampling rate-based undersampling method that could generate the number of each surgical phase data similarly to alleviate biased learning. To improve the performance of our method, we also introduced a two-stream CNN-LSTM model that could extract temporal information on behavioral changes between each image frame. First, we extracted a total of 40,236 short clips using an adaptive subsampling rate from the entire video. Each short clip was entered into a pre-trained GoogLeNet. The output with visual information was then immediately fed into a sequence-to-sequence LSTM model to extract temporal information of neighbor frames within a short clip. At the same time, another sequence-to-vector LSTM was used, to extract temporal information from all successive image frames to predict the final surgical phase. The proposed method was evaluated with a public dataset Cholec80. The proposed approach outperformed state-of-the-art methods, showing a high F1-score of 87.12% and an AUC of 98.00%. In addition, the F1-score deviation between all phases decreased by about 10% compared to that before applying undersampling. Experimental results confirmed that employing our proposed method could learn enrich temporal information from short clips. It outperformed the conventional one-stream CNN-LSTM architecture.

引用

页数：9

共 32 条

[1]

[Anonymous], 1994, P 3 INT C KNOWLEDGE, DOI DOI 10.5555/3000850.3000887

[2] Impact of data on generalization of AI for surgical intelligence applications [J].

Bar, Omri ;

Neimark, Daniel ;

Zohar, Maya ;

Hager, Gregory D. ;

Girshick, Ross ;

Fried, Gerald M. ;

Wolf, Tamir ;

Asselmann, Dotan .

SCIENTIFIC REPORTS, 2020, 10 (01)

[3]

Bardram JE, 2011, INT CONF PERVAS COMP, P45, DOI 10.1109/PERCOM.2011.5767594

[4]

Blum T, 2010, LECT NOTES COMPUT SC, V6363, P400

[5] A systematic study of the class imbalance problem in convolutional neural networks [J].

Buda, Mateusz ;

Maki, Atsuto ;

Mazurowski, Maciej A. .

NEURAL NETWORKS, 2018, 106 :249-259

[6] Surgeon Volume Metrics in Laparoscopic Cholecystectomy [J].

Csikesz, Nicholas G. ;

Singla, Anand ;

Murphy, Melissa M. ;

Tseng, Jennifer F. ;

Shah, Shimul A. .

DIGESTIVE DISEASES AND SCIENCES, 2010, 55 (08) :2398-2405

[7]

Czempiel Tobias, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12263), P343, DOI 10.1007/978-3-030-59716-0_33

[8]

Czempiel T, 2021, Arxiv, DOI arXiv:2103.03873

[9] Hidden Markov models [J].

Eddy, SR .

CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :361-365

[10] The hierarchical hidden Markov model: Analysis and applications [J].

Fine, S ;

Singer, Y ;

Tishby, N .

MACHINE LEARNING, 1998, 32 (01) :41-62

← 1 2 3 4 →