Hybrid Spatiotemporal Contrastive Representation Learning for Content-Based Surgical Video Retrieval

被引:2
作者
Kumar, Vidit [1 ]
Tripathi, Vikas [1 ]
Pant, Bhaskar [1 ]
Alshamrani, Sultan S. [2 ]
Dumka, Ankur [3 ]
Gehlot, Anita [4 ]
Singh, Rajesh [4 ]
Rashid, Mamoon [5 ]
Alshehri, Abdullah [6 ]
AlGhamdi, Ahmed Saeed [7 ]
机构
[1] Graph Era Deemed Univ, Dept Comp Sci & Engn, Dehra Dun 248002, Uttarakhand, India
[2] Taif Univ, Dept Informat Technol, Coll Comp & Informat Technol, POB 11099, At Taif 21944, Saudi Arabia
[3] Womens Inst Technol, Dept Comp Sci & Engn, Dehra Dun 248007, Uttarakhand, India
[4] Uttaranchal Univ, Div Res & Innovat, Dehra Dun 248007, Uttarakhand, India
[5] Vishwakarma Univ, Fac Sci & Technol, Dept Comp Engn, Pune 411048, Maharashtra, India
[6] Al Baha Univ, Dept Informat Technol, POB 1988, Al Baha 65731, Saudi Arabia
[7] Taif Univ, Dept Comp Engn, Coll Comp & Informat Technol, POB 11099, At Taif 21994, Saudi Arabia
关键词
laparoscopic video processing; recurrent deep convolutional network; surgical video retrieval; medical multimedia; temporal convolutional network; RECOGNITION; EDUCATION; TASKS;
D O I
10.3390/electronics11091353
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the medical field, due to their economic and clinical benefits, there is a growing interest in minimally invasive surgeries and microscopic surgeries. These types of surgeries are often recorded during operations, and these recordings have become a key resource for education, patient disease analysis, surgical error analysis, and surgical skill assessment. However, manual searching in this collection of long-term surgical videos is an extremely labor-intensive and long-term task, requiring an effective content-based video analysis system. In this regard, previous methods for surgical video retrieval are based on handcrafted features which do not represent the video effectively. On the other hand, deep learning-based solutions were found to be effective in both surgical image and video analysis, where CNN-, LSTM- and CNN-LSTM-based methods were proposed in most surgical video analysis tasks. In this paper, we propose a hybrid spatiotemporal embedding method to enhance spatiotemporal representations using an adaptive fusion layer on top of the LSTM and temporal causal convolutional modules. To learn surgical video representations, we propose exploring the supervised contrastive learning approach to leverage label information in addition to augmented versions. By validating our approach to a video retrieval task on two datasets, Surgical Actions 160 and Cataract-101, we significantly improve on previous results in terms of mean average precision, 30.012 +/- 1.778 vs. 22.54 +/- 1.557 for Surgical Actions 160 and 81.134 +/- 1.28 vs. 33.18 +/- 1.311 for Cataract-101. We also validate the proposed method's suitability for surgical phase recognition task using the benchmark Cholec80 surgical dataset, where our approach outperforms (with 90.2% accuracy) the state of the art.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Content-based image retrieval using Gaussian-Hermite moments and firefly and grey wolf optimization
    Tadepalli, Yasasvy
    Kollati, Meenakshi
    Kuraparthi, Swaraja
    Kora, Padmavathi
    Budati, Anil Kumar
    Pampana, Lakshmi Kala
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2021, 6 (02) : 135 - 146
  • [42] Content-Based Music Information Retrieval (CB-MIR) and Its Applications toward the Music Industry: A Review
    Murthy, Y. V. Srinivasa
    Koolagudi, Shashidhar G.
    ACM COMPUTING SURVEYS, 2018, 51 (03)
  • [43] A Novel Content-Based Image Indexing and Retrieval Framework Using Clockwise Local Difference Binary Pattern (CWLDBP)
    Ravinder, M.
    Tirupathamma, M.
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2015, 2016, 394 : 1009 - 1018
  • [44] An intelligent surgical video retrieval for computer vision enhancement in medical diagnosis using deep learning techniques
    Archana Mantri
    Rahul Mishra
    Multimedia Tools and Applications, 2025, 84 (13) : 12189 - 12217
  • [45] Learning Affective Features Based on VIP for Video Affective Content Analysis
    Zhu, Yingying
    Tong, Min
    Huang, Tinglin
    Wen, Zhenkun
    Tian, Qi
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 697 - 707
  • [46] Hybrid Dictionary Learning and Matching for Video-based Face Verification
    Zheng, Jingxiao
    Chen, Jun-Cheng
    Patel, Vishal M.
    Castillo, Carlos D.
    Chellappa, Rama
    2019 IEEE 10TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS), 2019,
  • [47] Capturing and Developing Teachers' Pedagogical Content Knowledge in Sustainable Development Using Content Representation and Video-Based Reflection
    Forsler, Annika
    Nilsson, Pernilla
    Walan, Susanne
    RESEARCH IN SCIENCE EDUCATION, 2024, 54 (03) : 393 - 412
  • [48] Hybrid SOM based cross-modal retrieval exploiting Hebbian learning
    Kaur, Parminder
    Malhi, Avleen Kaur
    Pannu, Husanbir Singh
    KNOWLEDGE-BASED SYSTEMS, 2022, 239
  • [49] Against spatial-temporal discrepancy: contrastive learning-based network for surgical workflow recognition
    Xia, Tong
    Jia, Fucang
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2021, 16 (05) : 839 - 848
  • [50] Learning Compact Appearance Representation for Video-Based Person Re-Identification
    Zhang, Wei
    Hu, Shengnan
    Liu, Kan
    Zha, Zhengjun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (08) : 2442 - 2452