Escaping local minima in deep reinforcement learning for video summarization

被引：0

作者：

Alexoudi, Panagiota ^{[1
]}

Mademlis, Ioannis ^{[1
]}

Pitas, Ioannis ^{[1
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece

来源：

PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023 | 2023年

关键词：

video summarization; key-frame extraction; unsupervised learning; deep reinforcement learning;

D O I：

10.1145/3591106.3592288

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

State-of-the-art deep neural unsupervised video summarization methods mostly fall under the adversarial reconstruction framework. This employs a Generative Adversarial Network (GAN) structure and Long Short-Term Memory (LSTM) autoencoders during its training stage. The typical result is a selector LSTM that sequentially receives video frame representations and outputs corresponding scalar importance factors, which are then used to select key-frames. This basic approach has been augmented with an additional Deep Reinforcement Learning (DRL) agent, trained using the Discriminator's output as a reward, which learns to optimize the selector's outputs. However, local minima are a well-known problem in DRL. Thus, this paper presents a novel regularizer for escaping local loss minima, in order to improve unsupervised key-frame extraction. It is an additive loss term employed during a second training phase, that rewards the difference of the neural agent's parameters from those of a previously found good solution. Thus, it encourages the training process to explore more aggressively the parameter space in order to discover a better local loss minimum. Evaluation performed on two public datasets shows considerable increases over the baseline and against the state-of-the-art.

引用

页码：530 / 534

页数：5

共 26 条

[1]

Apostolidis E., 2019, P INT WORKSH AI SMAR

[2] AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization [J].

Apostolidis, Evlampios ;

Adamantidou, Eleni ;

Metsai, Alexandros, I ;

Mezaris, Vasileios ;

Patras, Ioannis .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) :3278-3292

[3] Unsupervised Video Summarization via Attention-Driven Adversarial Learning [J].

Apostolidis, Evlampios ;

Adamantidou, Eleni ;

Metsai, Alexandros, I ;

Mezaris, Vasileios ;

Patras, Ioannis .

MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 :492-504

[4]

Gonuguntla N., 2019, P BRIT MACHINE VISIO

[5]

Goodfellow I., 2014, PROC NEURIP, P2672, DOI [DOI 10.1145/3422622, 10.1145/3422622]

[6] Creating Summaries from User Videos [J].

Gygli, Michael ;

Grabner, Helmut ;

Riemenschneider, Hayko ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :505-520

[7]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[8] Unsupervised Video Summarization with Attentive Conditional Generative Adversarial Networks [J].

He, Xufeng ;

Hua, Yang ;

Song, Tao ;

Zhang, Zongpu ;

Xue, Zhengui ;

Ma, Ruhui ;

Robertson, Neil ;

Guan, Haibing .

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :2296-2304

[9]

Jung Y, 2019, AAAI CONF ARTIF INTE, P8537

[10]

Kaseris M., 2021, P IEEE INT C IMAGE P

← 1 2 3 →