Egocentric video co-summarization using transfer learning and refined random walk on a constrained graph

被引:1
作者
Sahu, Abhimanyu [1 ]
Chowdhury, Ananda S. [1 ]
机构
[1] Jadavpur Univ, Dept Elect & Telecommun Engn, Kolkata 700032, India
关键词
Egocentric video; Transfer learning; Constrained graph; Random walks; Label refinement;
D O I
10.1016/j.patcog.2022.109128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address the problem of egocentric video co-summarization. We show how a shot level accurate summary can be obtained in a time-efficient manner using random walk on a constrained graph in transfer learned feature space with label refinement. While applying transfer learning, we propose a new loss function capturing egocentric characteristics in a pre-trained ResNet on the set of auxiliary egocentric videos. Transfer learning is used to generate i) an improved feature space and ii) a set of labels to be used as seeds for the test egocentric video. A complete weighted graph is created for a test video in the new transfer learned feature space with shots as the vertices. We derive two types of cluster label constraints in form of Must-Link (ML) and Cannot-link (CL) based on the similarity of the shots. ML constraints are used to prune the complete graph which is shown to result in substantial computational advantage, especially, for the long duration videos. We derive expressions for the number of vertices and edges for the ML-constrained graph and show that this graph remains connected. Random walk is applied to obtain labels of the unmarked shots in this new graph. CL constraints are applied to refine the cluster labels. Finally, shots closest to individual cluster centres are used to build the summary. Experiments on the short duration videos as in CoSum and TVSum datasets and long duration videos as in ADL and EPIC-Kitchens datasets clearly demonstrate the advantage of our solution over several state-of-the-art methods.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 39 条
  • [1] VISON: Video Summarization for ONline applications
    Almeida, Jurandy
    Leite, Neucimar J.
    Torres, Ricardo da S.
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (04) : 397 - 409
  • [2] Constraints as Features
    Asafi, Shmuel
    Cohen-Or, Daniel
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 1634 - 1641
  • [3] The Evolution of First Person Vision Methods: A Survey
    Betancourt, Alejandro
    Morerio, Pietro
    Regazzoni, Carlo S.
    Rauterberg, Matthias
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (05) : 744 - 760
  • [4] Toward Storytelling From Visual Lifelogging: An Overview
    Bolanos, Marc
    Dimiccoli, Mariella
    Radeva, Petia
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2017, 47 (01) : 77 - 90
  • [5] Chu WS, 2015, PROC CVPR IEEE, P3584, DOI 10.1109/CVPR.2015.7298981
  • [6] Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
    Damen, Dima
    Doughty, Hazel
    Farinella, Giovanni Maria
    Fidler, Sanja
    Furnari, Antonino
    Kazakos, Evangelos
    Moltisanti, Davide
    Munro, Jonathan
    Perrett, Toby
    Price, Will
    Wray, Michael
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 753 - 771
  • [7] Dhillon I. S., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P269, DOI 10.1145/502512.502550
  • [8] Online learnable keyframe extraction in videos and its application with semantic word vector in action recognition
    Elahi, G. M. Mashrur E.
    Yang, Yee-Hong
    [J]. PATTERN RECOGNITION, 2022, 122
  • [9] Fathi A, 2011, IEEE I CONF COMP VIS, P407, DOI 10.1109/ICCV.2011.6126269
  • [10] VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method
    Fontes de Avila, Sandra Eliza
    Brandao Lopes, Ana Paula
    da Luz, Antonio, Jr.
    Araujo, Arnaldo de Albuquerque
    [J]. PATTERN RECOGNITION LETTERS, 2011, 32 (01) : 56 - 68