Action-guided prompt tuning for video grounding

被引:0
|
作者
Wang, Jing [1 ]
Tsao, Raymon [2 ]
Wang, Xuan [1 ]
Wang, Xiaojie [1 ]
Feng, Fangxiang [1 ]
Tian, Shiyu [1 ]
Poria, Soujanya [3 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Xitucheng Rd 10, Beijing 100876, Peoples R China
[2] Peking Univ, 5 Yiheyuan Rd, Beijing 100871, Peoples R China
[3] Singapore Univ Technol & Design, Sch Informat Syst Technol & Design, 8 Somapah Rd, Singapore 487372, Singapore
基金
中国国家自然科学基金;
关键词
video grounding; Multi-modal learning; Prompt tuning; Temporal information;
D O I
10.1016/j.inffus.2024.102577
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video grounding aims to locate a moment-of-interest semantically corresponding to a given query. We claim that existing methods overlook two critical issues: (1) the sparsity of language, and (2) the human perception process of events. To be specific, previous studies forcibly map the video modality and language modality into a joint space for alignment, disregarding their inherent disparities. Verbs play a crucial role in queries, providing discriminative information for distinguishing different videos. However, in the video modality, actions especially salient ones, are typically manifested through a greater number of frames, encompassing a richer reservoir of informative details. At the query level, verbs are constrained to a single word representation,creating a disparity. This discrepancy highlights a significant sparsity in language features, resulting in the suboptimality of mapping the two modalities into a shared space naively. Furthermore, segmenting ongoing activity into meaningful events is integral to human perception and contributes event memory. Preceding methods fail to account for this essential perception process. Considering the aforementioned issues, we propose a novel Action-Guided Prompt Tuning (AGPT) method for video grounding. Firstly, we design a Prompt Exploration module to explore latent expansion information of salient verbs language,thereby reducing the language feature sparsity and facilitating cross-modal matching. Secondly, we design the auxiliary task of action temporal prediction for video grounding and introduce a temporal rank loss function to simulate the human perceptual system's segmentation of events, rendering our AGPT to be temporal-aware. Our approach can be seamlessly integrated into any video grounding model with minimal additional parameters. Extensive ablation experiments on three backbones and three datasets demonstrate the superiority of our method.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities
    Wang, Ping
    Sun, Li
    Wang, Liuan
    Sun, Jun
    SUSTAINABILITY, 2023, 15 (01)
  • [42] Context-aware generative prompt tuning for relation extraction
    Liu, Xiaoyong
    Wen, Handong
    Xu, Chunlin
    Du, Zhiguo
    Li, Huihui
    Hu, Miao
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (12) : 5495 - 5508
  • [43] Prompt tuning discriminative language models for hierarchical text classification
    du Toit, Jaco
    Dunaiski, Marcel
    NATURAL LANGUAGE PROCESSING, 2024,
  • [44] Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices
    Cai, Siqi
    Liu, Xuan
    Yuan, Jingling
    Zhou, Qihua
    PATTERN RECOGNITION, 2025, 163
  • [45] Meta-prompt tuning for low-resource visual question answeringMeta-prompt tuning for low-resource...M. Shao et al.
    Mingwen Shao
    Yuanyuan Liu
    Lingzhuang Meng
    Xun Shao
    Multimedia Systems, 2025, 31 (4)
  • [46] APRE: Annotation-Aware Prompt-Tuning for Relation Extraction
    Wei, Chao
    Chen, Yanping
    Wang, Kai
    Qin, Yongbin
    Huang, Ruizhang
    Zheng, Qinghua
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [47] Retrieval-Enhanced Event Temporal Relation Extraction by Prompt Tuning
    Luo, Rong
    Hu, Po
    WEB AND BIG DATA, PT IV, APWEB-WAIM 2023, 2024, 14334 : 16 - 30
  • [48] SimEmotion: A Simple Knowledgeable Prompt Tuning Method for Image Emotion Classification
    Deng, Sinuo
    Shi, Ge
    Wu, Lifang
    Xing, Lehao
    Hu, Wenjin
    Zhang, Heng
    Xiang, Ye
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 222 - 229
  • [49] Domain Prompt Tuning via Meta Relabeling for Unsupervised Adversarial Adaptation
    Jin, Xin
    Lan, Cuiling
    Zeng, Wenjun
    Chen, Zhibo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8333 - 8347
  • [50] Automating Method Naming with Context-Aware Prompt-Tuning
    Zhu, Jie
    Li, Lingwei
    Yang, Li
    Ma, Xiaoxiao
    Zuo, Chun
    2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 203 - 214