Query-Oriented Micro-Video Summarization

被引:0
作者
Jia, Mengzhao [1 ]
Wei, Yinwei [2 ]
Song, Xuemeng [1 ]
Sun, Teng [1 ]
Zhang, Min [3 ]
Nie, Liqiang [3 ]
机构
[1] Shandong Univ, Dept Comp Sci & Technol, Qingdao 250100, Peoples R China
[2] Monash Univ, Fac Informat Technol, Clayton, Vic 3800, Australia
[3] Harbin Inst Technol, Sch Comp, Shenzhen 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
Video summarization; query suggestion; micro-video retrieval;
D O I
10.1109/TPAMI.2024.3355402
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Query-oriented micro-video summarization task aims to generate a concise sentence with two properties: (a) summarizing the main semantic of the micro-video and (b) being expressed in the form of search queries to facilitate retrieval. Despite its enormous application value in the retrieval area, this direction has barely been explored. Previous studies of summarization mostly focus on the content summarization for traditional long videos. Directly applying these studies is prone to gain unsatisfactory results because of the unique features of micro-videos and queries: diverse entities and complex scenes within a short time, semantic gaps between modalities, and various queries in distinct expressions. To specifically adapt to these characteristics, we propose a query-oriented micro-video summarization model, dubbed QMS. It employs an encoder-decoder-based transformer architecture as the skeleton. The multi-modal (visual and textual) signals are passed through two modal-specific encoders to obtain their representations, followed by an entity-aware representation learning module to identify and highlight critical entity information. As to the optimization, regarding the large semantic gaps between modalities, we assign different confidence scores according to their semantic relevance in the optimization process. Additionally, we develop a novel strategy to sample the effective target query among the diverse query set with various expressions. Extensive experiments demonstrate the superiority of the QMS scheme, on both the summarization and retrieval tasks, over several state-of-the-art methods.
引用
收藏
页码:4174 / 4187
页数:14
相关论文
共 50 条
  • [31] Retrospective Encoders for Video Summarization
    Zhang, Ke
    Grauman, Kristen
    Sha, Fei
    COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 391 - 408
  • [32] A review on video summarization techniques
    Meena, Preeti
    Kumar, Himanshu
    Yadav, Sandeep Kumar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 118
  • [33] REPRESENTATIVE AND DIVERSE VIDEO SUMMARIZATION
    Chen, Xiao
    Li, Xuelong
    Lui, Xiaoqiang
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 142 - 146
  • [34] Video Lecture Summarization System
    Agrawal, Sujal
    Tirpude, Shubhangi
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (05): : 1091 - 1097
  • [35] MINMAX optimal video summarization
    Li, Z
    Schuster, GM
    Katsaggelos, AK
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2005, 15 (10) : 1245 - 1256
  • [36] Multiview video summarization using video partitioning and clustering
    Parihar, Anil Singh
    Pal, Joyeeta
    Sharma, Ishita
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 74
  • [37] Constraint satisfaction programming for video summarization
    Berrani, Sid-Ahmed
    Boukadida, Haykel
    Gros, Patrick
    2013 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2013, : 195 - 202
  • [38] Modelling perceptions on the evaluation of video summarization
    Abdalla, Kalyf
    Menezes, Igor
    Oliveira, Luciano
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 131 : 254 - 265
  • [39] SEMANTIC AUDIOVISUAL ANALYSIS FOR VIDEO SUMMARIZATION
    You, Junyong
    Hannuksela, Miska M.
    Gabbouj, Moncef
    EUROCON 2009: INTERNATIONAL IEEE CONFERENCE DEVOTED TO THE 150 ANNIVERSARY OF ALEXANDER S. POPOV, VOLS 1- 4, PROCEEDINGS, 2009, : 1358 - +
  • [40] Hierarchical Video Summarization with Loitering Indication
    Lu, Ruipeng
    Yang, Hua
    Zhu, Ji
    Wu, Shuang
    Wang, Jia
    Bull, David
    2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2015,