Explaining video summarization based on the focus of attention

被引：2

作者：

Apostolidis, Evlampios ^{[1
,2
]}

Balaouras, Georgios ^{[1
]}

Mezaris, Vasileios ^{[1
]}

Patras, Ioannis ^{[2
]}

机构：

[1] CERTH ITI, Thessaloniki 57001, Greece

[2] Queen Mary Univ London, London E1 4NS, England

来源：

2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM) | 2022年

基金：

欧盟地平线“2020”;

关键词：

Explainable AI; Video summarization; Attention mechanism; Evaluation measures;

D O I：

10.1109/ISM55400.2022.00029

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we propose a method for explaining video summarization. We start by formulating the problem as the creation of an explanation mask which indicates the parts of the video that influenced the most the estimates of a video summarization network, about the frames' importance. Then, we explain how the typical analysis pipeline of attention-based networks for video summarization can be used to define explanation signals, and we examine various attention-based signals that have been studied as explanations in the NLP domain. We evaluate the performance of these signals by investigating the video summarization network's input-output relationship according to different replacement functions, and utilizing measures that quantify the capability of explanations to spot the most and least influential parts of a video. We run experiments using an attention-based network (CA-SUM) and two datasets (SumMe and TVSum) for video summarization. Our evaluations indicate the advanced performance of explanations formed using the inherent attention weights, and demonstrate the ability of our method to explain the video summarization results using clues about the focus of the attention mechanism.

引用

页码：146 / 150

页数：5

共 21 条

[1]

Aakur S. N., AAAI 2018

[2]

Apostolidis E., 2022 ACM ICMR

[3] Video Summarization Using Deep Neural Networks: A Survey [J].

Apostolidis, Evlampios ;

Adamantidou, Eleni ;

Metsai, Alexandros, I ;

Mezaris, Vasileios ;

Patras, Ioannis .

PROCEEDINGS OF THE IEEE, 2021, 109 (11) :1838-1863

[4] Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments [J].

Bai, Xiao ;

Wang, Xiang ;

Liu, Xianglong ;

Liu, Qiang ;

Song, Jingkuan ;

Sebe, Nicu ;

Kim, Been .

PATTERN RECOGNITION, 2021, 120

[5] Excitation Backprop for RNNs [J].

Bargal, Sarah Adel ;

Zunino, Andrea ;

Kim, Donghyun ;

Zhang, Jianming ;

Murino, Vittorio ;

Sclaroff, Stan .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1440-1449

[6]

Chrysostomou G., 2021 ACL M

[7]

Fajtl J., ACCV 2018

[8] ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network [J].

Gkalelis, Nikolaos ;

Daskalakis, Dimitrios ;

Mezaris, Vasileios .

IEEE ACCESS, 2022, 10 :108797-108816

[9] Creating Summaries from User Videos [J].

Gygli, Michael ;

Grabner, Helmut ;

Riemenschneider, Hayko ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :505-520

[10]

Jain S, NAACL HLT 2019

← 1 2 3 →