Deep attentive and semantic preserving video summarization

被引：39

作者：

Ji, Zhong ^{[1
]}

Jiao, Fang ^{[1
]}

Pang, Yanwei ^{[1
]}

Shao, Ling ^{[1
,2
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

[2] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

来源：

NEUROCOMPUTING | 2020年 / 405卷

基金：

中国国家自然科学基金;

关键词：

Attention mechanism; Encoder-decoder; Semantic preserving; Video summarization;

D O I：

10.1016/j.neucom.2020.04.132

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video summarization shortens a lengthy video into a succinct version, whose challenges mainly originate from the difficulties of discovering the inherent relations between the original video and its summary, meanwhile minimizing the semantic information loss. Supervised approaches, especially those in deep learning framework, have demonstrated their effectiveness in video summarization. However, these approaches mainly focus on one of the challenges, and seldom pay close attention to both challenges simultaneously. To this end, we propose to pay close attention to this deficiency by incorporating the ideas of both the encoder-decoder attention and semantic preserving loss in a deep Seq2Seq framework for video summarization. Moreover, we also introduce Huber loss to replace the popular mean square error loss to enhance the robustness of the model to outliers. Extensive experiments on two benchmark video summarization datasets demonstrate that the proposed approach consistently outperforms the state-of-the-art ones. © 2020 Elsevier B.V.

引用

页码：200 / 207

页数：8

共 34 条

[1]

[Anonymous], 2000, Proceedings of the Fifth ACM Conference on Digital Libraries, DL '00, (New York, NY, USA)

[2]

[Anonymous], 2018, PROC EAR C COMPUT VI

[3]

[Anonymous], 2017, P IEEE C COMP VIS PA

[4]

[Anonymous], MULTIMEDIA IEEE T

[5]

[Anonymous], 2024, P 32 ACM INT C MULTI

[6]

Chu WS, 2015, PROC CVPR IEEE, P3584, DOI 10.1109/CVPR.2015.7298981

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method [J].

Fontes de Avila, Sandra Eliza ;

Brandao Lopes, Ana Paula ;

da Luz, Antonio, Jr. ;

Araujo, Arnaldo de Albuquerque .

PATTERN RECOGNITION LETTERS, 2011, 32 (01) :56-68

[9]

Gong BQ, 2014, ADV NEUR IN, V27

[10]

Gygli M, 2015, PROC CVPR IEEE, P3090, DOI 10.1109/CVPR.2015.7298928

← 1 2 3 4 →