Escaping local minima in deep reinforcement learning for video summarization

被引:0
作者
Alexoudi, Panagiota [1 ]
Mademlis, Ioannis [1 ]
Pitas, Ioannis [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece
来源
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023 | 2023年
关键词
video summarization; key-frame extraction; unsupervised learning; deep reinforcement learning;
D O I
10.1145/3591106.3592288
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art deep neural unsupervised video summarization methods mostly fall under the adversarial reconstruction framework. This employs a Generative Adversarial Network (GAN) structure and Long Short-Term Memory (LSTM) autoencoders during its training stage. The typical result is a selector LSTM that sequentially receives video frame representations and outputs corresponding scalar importance factors, which are then used to select key-frames. This basic approach has been augmented with an additional Deep Reinforcement Learning (DRL) agent, trained using the Discriminator's output as a reward, which learns to optimize the selector's outputs. However, local minima are a well-known problem in DRL. Thus, this paper presents a novel regularizer for escaping local loss minima, in order to improve unsupervised key-frame extraction. It is an additive loss term employed during a second training phase, that rewards the difference of the neural agent's parameters from those of a previously found good solution. Thus, it encourages the training process to explore more aggressively the parameter space in order to discover a better local loss minimum. Evaluation performed on two public datasets shows considerable increases over the baseline and against the state-of-the-art.
引用
收藏
页码:530 / 534
页数:5
相关论文
共 26 条
[1]  
Apostolidis E., 2019, P INT WORKSH AI SMAR
[2]   AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization [J].
Apostolidis, Evlampios ;
Adamantidou, Eleni ;
Metsai, Alexandros, I ;
Mezaris, Vasileios ;
Patras, Ioannis .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) :3278-3292
[3]   Unsupervised Video Summarization via Attention-Driven Adversarial Learning [J].
Apostolidis, Evlampios ;
Adamantidou, Eleni ;
Metsai, Alexandros, I ;
Mezaris, Vasileios ;
Patras, Ioannis .
MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 :492-504
[4]  
Gonuguntla N., 2019, P BRIT MACHINE VISIO
[5]  
Goodfellow I., 2014, PROC NEURIP, P2672, DOI [DOI 10.1145/3422622, 10.1145/3422622]
[6]   Creating Summaries from User Videos [J].
Gygli, Michael ;
Grabner, Helmut ;
Riemenschneider, Hayko ;
Van Gool, Luc .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :505-520
[7]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[8]   Unsupervised Video Summarization with Attentive Conditional Generative Adversarial Networks [J].
He, Xufeng ;
Hua, Yang ;
Song, Tao ;
Zhang, Zongpu ;
Xue, Zhengui ;
Ma, Ruhui ;
Robertson, Neil ;
Guan, Haibing .
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :2296-2304
[9]  
Jung Y, 2019, AAAI CONF ARTIF INTE, P8537
[10]  
Kaseris M., 2021, P IEEE INT C IMAGE P