Efficient Transformer for Video Summarization

被引:0
作者
Kolmakova, Tatiana [1 ]
Makarov, Ilya [2 ,3 ]
机构
[1] HSE Univ, Moscow, Russia
[2] Artificial Intelligence Res Inst AIRI, Moscow, Russia
[3] NUST MISiS, AI Ctr, Moscow, Russia
来源
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2023, PT II | 2023年 / 14135卷
关键词
Video Summarization; Deep Learning; Transformers; CREATION;
D O I
10.1007/978-3-031-43078-7_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The amount of user-generated content is increasing daily. That is especially true for video content that became popular with social media like TikTok. Other internet sources keep up and easier the way for video sharing. That is why automatic tools for finding core information of content but decreasing its volume are essential. Video summarization is aimed to help with it. In this work, we propose a transformer-based approach to supervised video summarization. Previous applications of attention architectures either used lighter versions or loaded models with RNN modules, that slower computations. Our proposed framework uses all advantages of transformers. Extensive evaluation on two benchmark datasets showed that the introduced model outperform existed approaches on the SumMe dataset by 3% and shows comparable results on the TVSum dataset.
引用
收藏
页码:52 / 65
页数:14
相关论文
共 57 条
  • [1] Abdrahimov Amir, 2022, 2022 International Russian Automation Conference (RusAutoCon), P436, DOI 10.1109/RusAutoCon54946.2022.9896386
  • [2] Apostolidis E., 2021, Video summarization using deep neural networks: A survey
  • [3] Apostolidis E., 2019, Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery, P17
  • [4] Unsupervised Video Summarization via Attention-Driven Adversarial Learning
    Apostolidis, Evlampios
    Adamantidou, Eleni
    Metsai, Alexandros, I
    Mezaris, Vasileios
    Patras, Ioannis
    [J]. MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 492 - 504
  • [5] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473]
  • [6] Weakly-Supervised Video Summarization Using Variational Encoder-Decoder and Web Prior
    Cai, Sijia
    Zuo, Wangmeng
    Davis, Larry S.
    Zhang, Lei
    [J]. COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 : 193 - 210
  • [7] Cho KYHY, 2014, Arxiv, DOI [arXiv:1406.1078, 10.48550/arXiv.1406.1078.]
  • [8] Cisco, 2020, Global networking trends report.
  • [9] Datt M, 2018, IEEE IMAGE PROC, P1268, DOI 10.1109/ICIP.2018.8451282
  • [10] Summarizing Videos with Attention
    Fajtl, Jiri
    Sokeh, Hajar Sadeghi
    Argyriou, Vasileios
    Monekosso, Dorothy
    Remagnino, Paolo
    [J]. COMPUTER VISION - ACCV 2018 WORKSHOPS, 2019, 11367 : 39 - 54