Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization

被引：0

作者：

Pang, Zongshang ^{[1
]}

Nakashima, Yuta ^{[1
]}

Otani, Mayu ^{[2
]}

Nagahara, Hajime ^{[1
]}

机构：

[1] Osaka Univ, Intelligence & Sensing Lab, Suita 5650871, Japan

[2] CyberAgent Inc, Tokyo 1500042, Japan

来源：

JOURNAL OF IMAGING | 2024年 / 10卷 / 09期

关键词：

video summarization; contrastive learning; visual pre-training;

D O I：

10.3390/jimaging10090229

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Past efforts have invariantly involved training summarization models with annotated summaries or heuristic objectives. In this work, we reveal that features pre-trained on image-level tasks contain rich semantic information that can be readily leveraged to quantify frame-level importance for zero-shot video summarization. Leveraging pre-trained features and contrastive learning, we propose three metrics featuring a desirable keyframe: local dissimilarity, global consistency, and uniqueness. We show that the metrics can well-capture the diversity and representativeness of frames commonly used for the unsupervised generation of video summaries, demonstrating competitive or better performance compared to past methods when no training is needed. We further propose a contrastive learning-based pre-training strategy on unlabeled videos to enhance the quality of the proposed metrics and, thus, improve the evaluated performance on the public benchmarks TVSum and SumMe.

引用

页数：20

共 82 条

[1] Abu-El-Haija S., 2016, arXiv
[2] [Anonymous], 2010, P 18 ACM INT C MULTI
[3] Bao H., 2021, arXiv
[4] Beyer W.H., 1991, Standard Probability and Statistics: Tables and Formulae
[5] Emerging Properties in Self-Supervised Vision Transformers
Caron, Mathilde
Touvron, Hugo
Misra, Ishan
Jegou, Herve
Mairal, Julien
Bojanowski, Piotr
Joulin, Armand
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
[6] Video Summarization with LSTM and Deep Attention Models
Casas, Luis Lebron
Koblents, Eugenia
[J]. MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 67 - 79
[7] Chen Y., 2019, P ACM MM AS BEIJ CHI
[8] Learning a similarity metric discriminatively, with application to face verification
Chopra, S
Hadsell, R
LeCun, Y
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
[9] Summarization of visual content in instructional videos
Choudary, Chekuri
Liu, Tiecheng
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (07) : 1443 - 1455
[10] Spatiotemporal Modeling and Label Distribution Learning for Video Summarization
Chu, Wei-Ta
Liu, Yu-Hsin
[J]. 2019 IEEE 21ST INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2019), 2019,

← 1 2 3 4 5 6 7 8 9 →