Query-based video summarization with multi-label classification network

被引:6
作者
Hu, Weifeng [1 ]
Zhang, Yu [1 ,2 ]
Li, Yujun [1 ]
Zhao, Jia [1 ]
Hu, Xifeng [1 ]
Cui, Yan [3 ]
Wang, Xuejing [1 ]
机构
[1] Shandong Univ, Sch Informat Sci & Engn, Qingdao 266200, Peoples R China
[2] State Grid China Technol Coll, Jinan 250002, Peoples R China
[3] Chinese Acad Social Sci, Inst Sociol, Beijing 100732, Peoples R China
关键词
Deep learning; Query-based video summarization; User subjectivity; Multi-label classification; Label correlation; AVERAGING FUSION STRATEGY;
D O I
10.1007/s11042-023-15126-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generic video summarization algorithms are characterized by the uniqueness of the final video summary result, which cannot satisfy the different summary requirements of different users for the same video. This paper addresses the task of query-based video summarization, which takes users' queries and long videos as inputs and aims to generate a query-based video summary. In this article, we propose a query-based video summarization algorithm with a multi-label classification network (MLC-SUM). Specifically, we treat video summarization as a target-based multi-label classification problem, and predict the correlation between video content and multi-concept labels by inputting convolutional features into a multi-layer perceptron, then use the cross-correlation of the labels to weight the predicted probability. Finally, we select the part of the video content with the highest relevance to the user's query sentence as the video summary output. Experiments on three common datasets verify the effectiveness and superiority of the proposed algorithm.
引用
收藏
页码:37529 / 37549
页数:21
相关论文
共 40 条
[1]   Leveraging semantic saliency maps for query-specific video summarization [J].
Cizmeciler, Kemal ;
Erdem, Erkut ;
Erdem, Aykut .
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (12) :17457-17482
[2]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[3]   Efficient visual attention based framework for extracting key frames from videos [J].
Ejaz, Naveed ;
Mehmood, Irfan ;
Baik, Sung Wook .
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2013, 28 (01) :34-44
[4]   Summarizing Videos with Attention [J].
Fajtl, Jiri ;
Sokeh, Hajar Sadeghi ;
Argyriou, Vasileios ;
Monekosso, Dorothy ;
Remagnino, Paolo .
COMPUTER VISION - ACCV 2018 WORKSHOPS, 2019, 11367 :39-54
[5]   Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model [J].
Fakhar, Babak ;
Kanan, Hamidreza Rashidy ;
Behrad, Alireza .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) :16995-17025
[6]   VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method [J].
Fontes de Avila, Sandra Eliza ;
Brandao Lopes, Ana Paula ;
da Luz, Antonio, Jr. ;
Araujo, Arnaldo de Albuquerque .
PATTERN RECOGNITION LETTERS, 2011, 32 (01) :56-68
[7]  
Gong BQ, 2014, ADV NEUR IN, V27
[8]   Creating Summaries from User Videos [J].
Gygli, Michael ;
Grabner, Helmut ;
Riemenschneider, Hayko ;
Van Gool, Luc .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :505-520
[9]   Cloud-Assisted Multiview Video Summarization Using CNN and Bidirectional LSTM [J].
Hussain, Tanveer ;
Muhammad, Khan ;
Ullah, Amin ;
Cao, Zehong ;
Baik, Sung Wook ;
de Albuquerque, Victor Hugo C. .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (01) :77-86
[10]   Video Summarization With Attention-Based Encoder-Decoder Networks [J].
Ji, Zhong ;
Xiong, Kailin ;
Pang, Yanwei ;
Li, Xuelong .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (06) :1709-1717