Video Summarization Generation Network Based on Dynamic Graph Contrastive Learning and Feature Fusion

被引：1

作者：

Zhang, Jing ^{[1
]}

Wu, Guangli ^{[1
]}

Bi, Xinlong ^{[1
]}

Cui, Yulong ^{[1
]}

机构：

[1] Gansu Univ Polit Sci & Law, Sch Cyberspace Secur, Lanzhou 730070, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 11期

关键词：

video summarization; graph neural network; graph contrastive learning; feature fusion; LSTM;

D O I：

10.3390/electronics13112039

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video summarization aims to analyze the structure and content of videos and extract key segments to construct summarization that can accurately summarize the main content, allowing users to quickly access the core information without browsing the full video. However, existing methods have difficulties in capturing long-term dependencies when dealing with long videos. On the other hand, there is a large amount of noise in graph structures, which may lead to the influence of redundant information and is not conducive to the effective learning of video features. To solve the above problems, we propose a video summarization generation network based on dynamic graph contrastive learning and feature fusion, which mainly consists of three modules: feature extraction, video encoder, and feature fusion. Firstly, we compute the shot features and construct a dynamic graph by using the shot features as nodes of the graph and the similarity between the shot features as the weights of the edges. In the video encoder, we extract the temporal and structural features in the video using stacked L-G Blocks, where the L-G Block consists of a bidirectional long short-term memory network and a graph convolutional network. Then, the shallow-level features are obtained after processing by L-G Blocks. In order to remove the redundant information in the graph, graph contrastive learning is used to obtain the optimized deep-level features. Finally, to fully exploit the feature information of the video, a feature fusion gate using the gating mechanism is designed to fully fuse the shallow-level features with the deep-level features. Extensive experiments are conducted on two benchmark datasets, TVSum and SumMe, and the experimental results show that our proposed method outperforms most of the current state-of-the-art video summarization methods.

引用

页数：15

共 33 条

[1] Video Summarization Using Deep Neural Networks: A Survey [J].

Apostolidis, Evlampios ;

Adamantidou, Eleni ;

Metsai, Alexandros, I ;

Mezaris, Vasileios ;

Patras, Ioannis .

PROCEEDINGS OF THE IEEE, 2021, 109 (11) :1838-1863

[2] Creating Summaries from User Videos [J].

Gygli, Michael ;

Grabner, Helmut ;

Riemenschneider, Hayko ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :505-520

[3]

Haq HBU, 2020, International journal of scientific and technology research, V9, P146

[4] A Novel Key-Frames Selection Framework for Comprehensive Video Summarization [J].

Huang, Cheng ;

Wang, Hongmei .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (02) :577-589

[5] Video Joint Modelling Based on Hierarchical Transformer for Co-Summarization [J].

Li, Haopeng ;

Ke, Qiuhong ;

Gong, Mingming ;

Zhang, Rui .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) :3904-3917

[6] Video Summarization Based on Mutual Information and Entropy Sliding Window Method [J].

Li, WenLin ;

Qi, Deyu ;

Zhang, ChangJian ;

Guo, Jing ;

Yao, JiaJun .

ENTROPY, 2020, 22 (11) :1-16

[7] Deep hierarchical LSTM networks with attention for video summarization [J].

Lin, Jingxu ;

Zhong, Sheng-hua ;

Fares, Ahmed .

COMPUTERS & ELECTRICAL ENGINEERING, 2022, 97

[8] Video Summarization Through Reinforcement Learning With a 3D Spatio-Temporal U-Net [J].

Liu, Tianrui ;

Meng, Qingjie ;

Huang, Jun-Jie ;

Vlontzos, Athanasios ;

Rueckert, Daniel ;

Kainz, Bernhard .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :1573-1586

[9] Unsupervised Video Summarization with Adversarial LSTM Networks [J].

Mahasseni, Behrooz ;

Lam, Michael ;

Todorovic, Sinisa .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2982-2991

[10] A review on video summarization techniques [J].

Meena, Preeti ;

Kumar, Himanshu ;

Yadav, Sandeep Kumar .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 118

← 1 2 3 4 →