Modality-Aware Heterogeneous Graph for Joint Video Moment Retrieval and Highlight Detection

被引:1
作者
Wang, Ruomei [1 ]
Feng, Jiawei [1 ]
Zhang, Fuwei [1 ]
Luo, Xiaonan [2 ]
Luo, Yuanmao [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Natl Engn Res Ctr Digital Life, Guangzhou 510006, Peoples R China
[2] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
Video moment retrieval; video highlight detection; heterogeneous graph; cross-modal interaction; NETWORKS;
D O I
10.1109/TCSVT.2024.3389024
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The joint task of video moment retrieval and video highlight detection is a challenging study, which requires building a model that not only captures contextual information between sequences in time but also has the ability to understand and judge significance. This paper solves these problems from three aspects. Firstly, we design a parameter-free cross-modal statistical correlation interaction method. A novel saliency enhancement function is defined to quantify the saliency differences between the important features associated with the query and other features to achieve parameter-free cross-modal fusion. Secondly, we propose a novel modality-aware heterogeneous graph reasoning mechanism (MHGR). MHGR can effectively capture the global context information between sequences, enhance the local association relationship between sequences, and deal with the complexity of multi-modal data better through the organic combination of two key modules: parameter-free cross-modal statistical correlation interaction, and heterogeneous graph reasoning mechanism. Thirdly, a lightweight solution for the joint task of video moment retrieval and highlight detection is designed based on the above two novel algorithm modules. Comprehensive experiments are conducted on publicly available benchmark data to validate the advantages of the new solution in comparison with a series of state-of-the-art peer methods. Quantitative results consistently demonstrate that the new solution is lightweight and has high inference performance so the remarkable improvement in accuracy achieved by the new solution with respect to peer methods. An extended ablation study is further conducted to show the usefulness of each module of the solution in acquiring its computational capabilities.
引用
收藏
页码:8896 / 8911
页数:16
相关论文
共 14 条
  • [1] Subtask Prior-Driven Optimized Mechanism on Joint Video Moment Retrieval and Highlight Detection
    Zhou, Siyu
    Zhang, Fuwei
    Wang, Ruomei
    Zhou, Fan
    Su, Zhuo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11271 - 11285
  • [2] MIM: LIGHTWEIGHT MULTI-MODAL INTERACTION MODEL FOR JOINT VIDEO MOMENT RETRIEVAL AND HIGHLIGHT DETECTION
    Li, Jinyu
    Zhang, Fuwei
    Lin, Shujin
    Zhou, Fan
    Wang, Ruomei
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1961 - 1966
  • [3] Fine-Grained Modality Relation-Aware Network for Video Moment Retrieval
    Zhao, Yibo
    Gao, Zan
    Ma, Chunjie
    Guan, Weili
    Wang, Riwei
    Chen, Shengyong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3315 - 3327
  • [4] Query-aware video encoder for video moment retrieval
    Hao, Jiachang
    Sun, Haifeng
    Ren, Pengfei
    Wang, Jingyu
    Qi, Qi
    Liao, Jianxin
    NEUROCOMPUTING, 2022, 483 : 72 - 86
  • [5] MLLM as video narrator: Mitigating modality imbalance in video moment retrieval
    Cai, Weitong
    Huang, Jiabo
    Gong, Shaogang
    Jin, Hailin
    Liu, Yang
    PATTERN RECOGNITION, 2025, 166
  • [6] Video Moment Retrieval via Comprehensive Relation-Aware Network
    Sun, Xin
    Gao, Jialin
    Zhu, Yizhe
    Wang, Xuan
    Zhou, Xi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5281 - 5295
  • [7] Time-Frequency Mutual Learning for Moment Retrieval and Highlight Detection
    Zhong, Yaokun
    Liang, Tianming
    Hu, Jian-Fang
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 34 - 48
  • [8] GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features
    Sun, Yunzhuo
    Xu, Yifang
    Xie, Zien
    Shu, Yukun
    Du, Sidan
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 521 - 525
  • [9] Boundary-Aware Noise-Resistant Video Moment Retrieval
    Yu, Fengzhen
    Gu, Xiaodong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT III, 2024, 15018 : 193 - 206
  • [10] Temporal refinement and multi-grained matching for moment retrieval and highlight detection
    Zhu, Cunjuan
    Zhang, Yanyi
    Jia, Qi
    Wang, Weimin
    Liu, Yu
    MULTIMEDIA SYSTEMS, 2025, 31 (01)