Modality-Aware Heterogeneous Graph for Joint Video Moment Retrieval and Highlight Detection

被引：1

作者：

Wang, Ruomei ^{[1
]}

Feng, Jiawei ^{[1
]}

Zhang, Fuwei ^{[1
]}

Luo, Xiaonan ^{[2
]}

Luo, Yuanmao ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Natl Engn Res Ctr Digital Life, Guangzhou 510006, Peoples R China

[2] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, Guilin 541004, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Video moment retrieval; video highlight detection; heterogeneous graph; cross-modal interaction; NETWORKS;

D O I：

10.1109/TCSVT.2024.3389024

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The joint task of video moment retrieval and video highlight detection is a challenging study, which requires building a model that not only captures contextual information between sequences in time but also has the ability to understand and judge significance. This paper solves these problems from three aspects. Firstly, we design a parameter-free cross-modal statistical correlation interaction method. A novel saliency enhancement function is defined to quantify the saliency differences between the important features associated with the query and other features to achieve parameter-free cross-modal fusion. Secondly, we propose a novel modality-aware heterogeneous graph reasoning mechanism (MHGR). MHGR can effectively capture the global context information between sequences, enhance the local association relationship between sequences, and deal with the complexity of multi-modal data better through the organic combination of two key modules: parameter-free cross-modal statistical correlation interaction, and heterogeneous graph reasoning mechanism. Thirdly, a lightweight solution for the joint task of video moment retrieval and highlight detection is designed based on the above two novel algorithm modules. Comprehensive experiments are conducted on publicly available benchmark data to validate the advantages of the new solution in comparison with a series of state-of-the-art peer methods. Quantitative results consistently demonstrate that the new solution is lightweight and has high inference performance so the remarkable improvement in accuracy achieved by the new solution with respect to peer methods. An extended ablation study is further conducted to show the usefulness of each module of the solution in acquiring its computational capabilities.

引用

页码：8896 / 8911

页数：16

共 14 条

[1] Subtask Prior-Driven Optimized Mechanism on Joint Video Moment Retrieval and Highlight Detection
Zhou, Siyu
Zhang, Fuwei
Wang, Ruomei
Zhou, Fan
Su, Zhuo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11271 - 11285
[2] MIM: LIGHTWEIGHT MULTI-MODAL INTERACTION MODEL FOR JOINT VIDEO MOMENT RETRIEVAL AND HIGHLIGHT DETECTION
Li, Jinyu
Zhang, Fuwei
Lin, Shujin
Zhou, Fan
Wang, Ruomei
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1961 - 1966
[3] Fine-Grained Modality Relation-Aware Network for Video Moment Retrieval
Zhao, Yibo
Gao, Zan
Ma, Chunjie
Guan, Weili
Wang, Riwei
Chen, Shengyong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3315 - 3327
[4] Query-aware video encoder for video moment retrieval
Hao, Jiachang
Sun, Haifeng
Ren, Pengfei
Wang, Jingyu
Qi, Qi
Liao, Jianxin
NEUROCOMPUTING, 2022, 483 : 72 - 86
[5] MLLM as video narrator: Mitigating modality imbalance in video moment retrieval
Cai, Weitong
Huang, Jiabo
Gong, Shaogang
Jin, Hailin
Liu, Yang
PATTERN RECOGNITION, 2025, 166
[6] Video Moment Retrieval via Comprehensive Relation-Aware Network
Sun, Xin
Gao, Jialin
Zhu, Yizhe
Wang, Xuan
Zhou, Xi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5281 - 5295
[7] Time-Frequency Mutual Learning for Moment Retrieval and Highlight Detection
Zhong, Yaokun
Liang, Tianming
Hu, Jian-Fang
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 34 - 48
[8] GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features
Sun, Yunzhuo
Xu, Yifang
Xie, Zien
Shu, Yukun
Du, Sidan
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 521 - 525
[9] Boundary-Aware Noise-Resistant Video Moment Retrieval
Yu, Fengzhen
Gu, Xiaodong
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT III, 2024, 15018 : 193 - 206
[10] Temporal refinement and multi-grained matching for moment retrieval and highlight detection
Zhu, Cunjuan
Zhang, Yanyi
Jia, Qi
Wang, Weimin
Liu, Yu
MULTIMEDIA SYSTEMS, 2025, 31 (01)

← 1 2 →