Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引:5
|
作者
Lu, Yu [1 ]
Quan, Ruijie [2 ]
Zhu, Linchao [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia
[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China
基金
澳大利亚研究理事会;
关键词
Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;
D O I
10.1109/TIP.2024.3365249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.
引用
收藏
页码:1643 / 1654
页数:12
相关论文
共 50 条
  • [31] Domain Shift Preservation for Zero-Shot Domain Adaptation
    Wang, Jinghua
    Cheng, Ming-Ming
    Jiang, Jianmin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5505 - 5517
  • [32] Diversity-Boosted Generalization-Specialization Balancing for Zero-Shot Learning
    Li, Yun
    Liu, Zhe
    Chang, Xiaojun
    McAuley, Julian
    Yao, Lina
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8372 - 8382
  • [33] Zero-Shot Modulation Recognition via Knowledge-Informed Waveform Description
    Chen, Ying
    Wang, Xiang
    Huang, Zhitao
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 21 - 25
  • [34] MFHI: Taking Modality-Free Human Identification as Zero-Shot Learning
    Liu, Zhizhe
    Zhang, Xingxing
    Zhu, Zhenfeng
    Zheng, Shuai
    Zhao, Yao
    Cheng, Jian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5225 - 5237
  • [35] E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
    Bao, Peijun
    Shao, Zihao
    Yang, Wenhan
    Ng, Boon Poh
    Kot, Alex C.
    COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 227 - 243
  • [36] Zero-Shot Learning via Discriminative Dual Semantic Auto-Encoder
    Xing, Nan
    Liu, Yang
    Zhu, Hong
    Wang, Jing
    Han, Jungong
    IEEE ACCESS, 2021, 9 : 733 - 742
  • [37] Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
    Li, Wenrui
    Wang, Penghong
    Xiong, Ruiqin
    Fan, Xiaopeng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4840 - 4852
  • [38] Dual-verification network for zero-shot learning
    Zhang, Haofeng
    Long, Yang
    Yang, Wankou
    Shao, Ling
    INFORMATION SCIENCES, 2019, 470 : 43 - 57
  • [39] Grounding Visual Concepts for Zero-Shot Event Detection and Event Captioning
    Li, Zhihui
    Chang, Xiaojun
    Yao, Lina
    Pan, Shirui
    Ge Zongyuan
    Zhang, Huaxiang
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 297 - 305
  • [40] Bidirectional Mapping Coupled GAN for Generalized Zero-Shot Learning
    Shermin, Tasfia
    Teng, Shyh Wei
    Sohel, Ferdous
    Murshed, Manzur
    Lu, Guojun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 721 - 733