Hybrid Spatiotemporal Contrastive Representation Learning for Content-Based Surgical Video Retrieval

被引：2

作者：

Kumar, Vidit ^{[1
]}

Tripathi, Vikas ^{[1
]}

Pant, Bhaskar ^{[1
]}

Alshamrani, Sultan S. ^{[2
]}

Dumka, Ankur ^{[3
]}

Gehlot, Anita ^{[4
]}

Singh, Rajesh ^{[4
]}

Rashid, Mamoon ^{[5
]}

Alshehri, Abdullah ^{[6
]}

AlGhamdi, Ahmed Saeed ^{[7
]}

机构：

[1] Graph Era Deemed Univ, Dept Comp Sci & Engn, Dehra Dun 248002, Uttarakhand, India

[2] Taif Univ, Dept Informat Technol, Coll Comp & Informat Technol, POB 11099, At Taif 21944, Saudi Arabia

[3] Womens Inst Technol, Dept Comp Sci & Engn, Dehra Dun 248007, Uttarakhand, India

[4] Uttaranchal Univ, Div Res & Innovat, Dehra Dun 248007, Uttarakhand, India

[5] Vishwakarma Univ, Fac Sci & Technol, Dept Comp Engn, Pune 411048, Maharashtra, India

[6] Al Baha Univ, Dept Informat Technol, POB 1988, Al Baha 65731, Saudi Arabia

[7] Taif Univ, Dept Comp Engn, Coll Comp & Informat Technol, POB 11099, At Taif 21994, Saudi Arabia

来源：

ELECTRONICS | 2022年 / 11卷 / 09期

关键词：

laparoscopic video processing; recurrent deep convolutional network; surgical video retrieval; medical multimedia; temporal convolutional network; RECOGNITION; EDUCATION; TASKS;

D O I：

10.3390/electronics11091353

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the medical field, due to their economic and clinical benefits, there is a growing interest in minimally invasive surgeries and microscopic surgeries. These types of surgeries are often recorded during operations, and these recordings have become a key resource for education, patient disease analysis, surgical error analysis, and surgical skill assessment. However, manual searching in this collection of long-term surgical videos is an extremely labor-intensive and long-term task, requiring an effective content-based video analysis system. In this regard, previous methods for surgical video retrieval are based on handcrafted features which do not represent the video effectively. On the other hand, deep learning-based solutions were found to be effective in both surgical image and video analysis, where CNN-, LSTM- and CNN-LSTM-based methods were proposed in most surgical video analysis tasks. In this paper, we propose a hybrid spatiotemporal embedding method to enhance spatiotemporal representations using an adaptive fusion layer on top of the LSTM and temporal causal convolutional modules. To learn surgical video representations, we propose exploring the supervised contrastive learning approach to leverage label information in addition to augmented versions. By validating our approach to a video retrieval task on two datasets, Surgical Actions 160 and Cataract-101, we significantly improve on previous results in terms of mean average precision, 30.012 +/- 1.778 vs. 22.54 +/- 1.557 for Surgical Actions 160 and 81.134 +/- 1.28 vs. 33.18 +/- 1.311 for Cataract-101. We also validate the proposed method's suitability for surgical phase recognition task using the benchmark Cholec80 surgical dataset, where our approach outperforms (with 90.2% accuracy) the state of the art.

引用

页数：20

共 50 条

[31] UltraCLR: Contrastive Representation Learning Framework for Ultrasound-based Sensing
Wang, Xun
Yang, Zhizheng
Wang, Wei
Dai, Haipeng
Shi, Shuyu
Gu, Qing
ACM TRANSACTIONS ON SENSOR NETWORKS, 2024, 20 (04)
[32] A Content-Based Image Retrieval Method Using Neural Network-Based Prediction Technique
Alshehri, Mohammed
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 2957 - 2973
[33] Content-Based Video Emotion Tagging Augmented by Users' Multiple Physiological Responses
Wang, Shangfei
Chen, Shiyu
Ji, Qiang
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (02) : 155 - 166
[34] Automatic annotation of tennis action for content-based retrieval by collaborating audio and visual information
Miyamori, H
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2004, 87 (11): : 57 - 72
[35] Content-based retrieval of biomedical images using orthogonal Fourier-Mellin moments
Sharma, Suchita
Aggarwal, Ashutosh
COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2019, 7 (03) : 286 - 296
[36] Content-based face image retrieval using quaternion based local diagonal extreme value pattern
Sukhia, Komal Nain
Riaz, M. Mohsin
Amin, Benish
Ghafoor, Abdul
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (25) : 65737 - 65752
[37] Content-Based Visual Landmark Search via Multimodal Hypergraph Learning
Zhu, Lei
Shen, Jialie
Jin, Hai
Zheng, Ran
Xie, Liang
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (12) : 2756 - 2769
[38] ROBUST CONTENT-BASED IMAGE RETRIEVAL USING ICCV, GLCM, AND DWT-MSLBP DESCRIPTORS
Chavda, Sagar
Goyani, Mahesh
COMPUTER SCIENCE-AGH, 2022, 23 (01): : 5 - 36
[39] Review of image low-level feature extraction methods for content-based image retrieval
Wang, Shenlong
Han, Kaixin
Jin, Jiafeng
SENSOR REVIEW, 2019, 39 (06) : 783 - 809
[40] Content-Based Medical Image Retrieval and Intelligent Interactive Visual Browser for Medical Education, Research and Care
Sotomayor, Camilo G.
Mendoza, Marcelo
Castaneda, Victor
Farias, Humberto
Molina, Gabriel
Pereira, Gonzalo
Hartel, Steffen
Solar, Mauricio
Araya, Mauricio
DIAGNOSTICS, 2021, 11 (08)

← 1 2 3 4 5 →