Keyframe extraction from laparoscopic videos based on visual saliency detection

被引:26
|
作者
Loukas, Constantinos [1 ]
Varytimidis, Christos [2 ]
Rapantzikos, Konstantinos [2 ]
Kanakis, Meletios A. [3 ]
机构
[1] Univ Athens, Med Sch, Lab Med Phys, Mikras Asias 75 Str, Athens 11527, Greece
[2] Natl & Tech Univ Athens, Sch Elect & Comp Engn, Athens, Greece
[3] Great Ormond St Hosp Sick Children, Cardiothorac Surg Unit, London, England
关键词
Video analysis; Keyframe extraction; Hidden Markov multivariate autoregressive models; Visual saliency; ATTENTION DRIVEN FRAMEWORK; ENDOSCOPIC SURGERY VIDEOS; KEY-FRAMES; CLASSIFICATION; MODELS;
D O I
10.1016/j.cmpb.2018.07.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and objective: Laparoscopic surgery offers the potential for video recording of the operation, which is important for technique evaluation, cognitive training, patient briefing and documentation. An effective way for video content representation is to extract a limited number of keyframes with semantic information. In this paper we present a novel method for keyframe extraction from individual shots of the operational video. Methods: The laparoscopic video was first segmented into video shots using an objectness model, which was trained to capture significant changes in the endoscope field of view. Each frame of a shot was then decomposed into three saliency maps in order to model the preference of human vision to regions with higher differentiation with respect to color, motion and texture. The accumulated responses from each map provided a 3D time series of saliency variation across the shot. The time series was modeled as a multivariate autoregressive process with hidden Markov states (HMMAR model). This approach allowed the temporal segmentation of the shot into a predefined number of states. A representative keyframe was extracted from each state based on the highest state-conditional probability of the corresponding saliency vector. Results: Our method was tested on 168 video shots extracted from various laparoscopic cholecystectomy operations from the publicly available Cholec80 dataset. Four state-of-the-art methodologies were used for comparison. The evaluation was based on two assessment metrics: Color Consistency Score (CCS), which measures the color distance between the ground truth (GT) and the closest keyframe, and Temporal Consistency Score (TCS), which considers the temporal proximity between GT and extracted keyframes. About 81% of the extracted keyframes matched the color content of the GT keyframes, compared to 77% yielded by the second-best method. The TCS of the proposed and the second-best method was close to 1.9 and 1.4 respectively. Conclusions: Our results demonstrated that the proposed method yields superior performance in terms of content and temporal consistency to the ground truth. The extracted keyframes provided highly semantic information that may be used for various applications related to surgical video content representation, such as workflow analysis, video summarization and retrieval. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:13 / 23
页数:11
相关论文
共 50 条
  • [1] Keyframe Extraction From Laparoscopic Videos via Diverse and Weighted Dictionary Selection
    Ma, Mingyang
    Mei, Shaohui
    Wan, Shuai
    Wang, Zhiyong
    Ge, Zongyuan
    Lam, Vincent
    Feng, Dagan
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (05) : 1686 - 1698
  • [2] Selective extraction of visual saliency objects in images and videos
    Zhao, Zhi-Cheng
    Cai, An-Ni
    2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1, PROCEEDINGS, 2007, : 198 - 201
  • [3] Prediction of remaining surgery duration in laparoscopic videos based on visual saliency and the transformer network
    Loukas, Constantinos
    Seimenis, Ioannis
    Prevezanou, Konstantina
    Schizas, Dimitrios
    INTERNATIONAL JOURNAL OF MEDICAL ROBOTICS AND COMPUTER ASSISTED SURGERY, 2024, 20 (02):
  • [4] Concept detection and keyframe extraction using a visual thesaurus
    Spyrou, Evaggelos
    Tolias, Giorgos
    Mylonas, Phivos
    Avrithis, Yannis
    MULTIMEDIA TOOLS AND APPLICATIONS, 2009, 41 (03) : 337 - 373
  • [5] Concept detection and keyframe extraction using a visual thesaurus
    Evaggelos Spyrou
    Giorgos Tolias
    Phivos Mylonas
    Yannis Avrithis
    Multimedia Tools and Applications, 2009, 41 : 337 - 373
  • [6] SPATIAL KEYFRAME EXTRACTION OF MOBILE VIDEOS FOR EFFICIENT OBJECT DETECTION AT THE EDGE
    Constantinou, George
    Shahabi, Cyrus
    Kim, Seon Ho
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1466 - 1470
  • [7] Multiscale motion saliency for keyframe extraction from motion capture sequences
    Halit, Cihan
    Capin, Tolga
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2011, 22 (01) : 3 - 14
  • [8] Keyframe Selection for Robust Pose Estimation in Laparoscopic Videos
    von Oehsen, Udo
    Marcinczak, Jan Marek
    Velez, Andres Felipe Marmol
    Grigat, Rolf-Rainer
    MEDICAL IMAGING 2012: IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, 2012, 8316
  • [9] Encoding based Saliency Detection for Videos and Images
    Mauthner, Thomas
    Possegger, Horst
    Waltner, Georg
    Bischof, Horst
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2494 - 2502
  • [10] Underwater image feature extraction and matching based on visual saliency detection
    Zhang, Lunjuan
    He, Bo
    Song, Yan
    Yan, Tianhong
    OCEANS 2016 - SHANGHAI, 2016,