Action-guided prompt tuning for video grounding

被引:0
|
作者
Wang, Jing [1 ]
Tsao, Raymon [2 ]
Wang, Xuan [1 ]
Wang, Xiaojie [1 ]
Feng, Fangxiang [1 ]
Tian, Shiyu [1 ]
Poria, Soujanya [3 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Xitucheng Rd 10, Beijing 100876, Peoples R China
[2] Peking Univ, 5 Yiheyuan Rd, Beijing 100871, Peoples R China
[3] Singapore Univ Technol & Design, Sch Informat Syst Technol & Design, 8 Somapah Rd, Singapore 487372, Singapore
基金
中国国家自然科学基金;
关键词
video grounding; Multi-modal learning; Prompt tuning; Temporal information;
D O I
10.1016/j.inffus.2024.102577
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video grounding aims to locate a moment-of-interest semantically corresponding to a given query. We claim that existing methods overlook two critical issues: (1) the sparsity of language, and (2) the human perception process of events. To be specific, previous studies forcibly map the video modality and language modality into a joint space for alignment, disregarding their inherent disparities. Verbs play a crucial role in queries, providing discriminative information for distinguishing different videos. However, in the video modality, actions especially salient ones, are typically manifested through a greater number of frames, encompassing a richer reservoir of informative details. At the query level, verbs are constrained to a single word representation,creating a disparity. This discrepancy highlights a significant sparsity in language features, resulting in the suboptimality of mapping the two modalities into a shared space naively. Furthermore, segmenting ongoing activity into meaningful events is integral to human perception and contributes event memory. Preceding methods fail to account for this essential perception process. Considering the aforementioned issues, we propose a novel Action-Guided Prompt Tuning (AGPT) method for video grounding. Firstly, we design a Prompt Exploration module to explore latent expansion information of salient verbs language,thereby reducing the language feature sparsity and facilitating cross-modal matching. Secondly, we design the auxiliary task of action temporal prediction for video grounding and introduce a temporal rank loss function to simulate the human perceptual system's segmentation of events, rendering our AGPT to be temporal-aware. Our approach can be seamlessly integrated into any video grounding model with minimal additional parameters. Extensive ablation experiments on three backbones and three datasets demonstrate the superiority of our method.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Prompt Your Brain: Scaffold Prompt Tuning for Efficient Adaptation of fMRI Pre-trained Model
    Dong, Zijian
    Wu, Yilei
    Chen, Zijiao
    Zhang, Yichi
    Jin, Yueming
    Zhou, Juan Helen
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI, 2024, 15011 : 512 - 521
  • [32] KEPT: Knowledge Enhanced Prompt Tuning for event causality identification
    Liu, Jintao
    Zhang, Zequn
    Guo, Zhi
    Jin, Li
    Li, Xiaoyu
    Wei, Kaiwen
    Sun, Xian
    KNOWLEDGE-BASED SYSTEMS, 2023, 259
  • [33] Short text classification with Soft Knowledgeable Prompt-tuning
    Zhu, Yi
    Wang, Ye
    Mu, Jianyuan
    Li, Yun
    Qiang, Jipeng
    Yuan, Yunhao
    Wu, Xindong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 246
  • [34] Filling in the Blank: Rationale-Augmented Prompt Tuning for TextVQA
    Zeng, Gangyan
    Zhang, Yuan
    Zhou, Yu
    Fang, Bo
    Zhao, Guoqing
    Wei, Xin
    Wang, Weiping
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 1261 - 1272
  • [35] Ontology-based prompt tuning for news article summarization
    Silva, A. R. S.
    Priyadarshana, Y. H. P. P.
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2025, 8
  • [36] COPHTC: CONTRASTIVE LEARNING WITH PROMPT TUNING FOR HIERARCHICAL TEXT CLASSIFICATION
    Cai, Fuhan
    Zhang, Zhongqiang
    Liu, Duo
    Fang, Xiangzhong
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5400 - 5404
  • [37] Prompt tuning for parameter-efficient medical image segmentation
    Fischer, Marc
    Bartler, Alexander
    Yang, Bin
    MEDICAL IMAGE ANALYSIS, 2024, 91
  • [38] VPN: Variation on Prompt Tuning for Named-Entity Recognition
    Hu, Niu
    Zhou, Xuan
    Xu, Bing
    Liu, Hanqing
    Xie, Xiangjin
    Zheng, Hai-Tao
    APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [39] Data Augmentation by Prompt Tuning on Natural Language Understanding Tasks
    Wang, Yu-Hao
    Chang, Chia-Ming
    Tsai, Yi-Hang
    Hwang, San-Yih
    2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 807 - 808
  • [40] Model tuning or prompt Tuning? a study of large language models for clinical concept and relation extraction
    Peng, Cheng
    Yang, Xi
    Smith, Kaleb E.
    Yu, Zehao
    Chen, Aokun
    Bian, Jiang
    Wu, Yonghui
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 153