Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引:5
|
作者
Lu, Yu [1 ]
Quan, Ruijie [2 ]
Zhu, Linchao [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia
[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China
基金
澳大利亚研究理事会;
关键词
Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;
D O I
10.1109/TIP.2024.3365249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.
引用
收藏
页码:1643 / 1654
页数:12
相关论文
共 50 条
  • [1] Exploiting Prior Tacit Knowledge to Enhance Alignment and Verification in zero-shot video grounding
    Wang, Jing
    Zhao, Xianbing
    Wang, Xiaojie
    Feng, Fangxiang
    NEUROCOMPUTING, 2025, 639
  • [2] Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
    Jiang, Xun
    Xu, Xing
    Zhou, Zailei
    Yang, Yang
    Shen, Fumin
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9657 - 9670
  • [3] Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities
    Wang, Ping
    Sun, Li
    Wang, Liuan
    Sun, Jun
    SUSTAINABILITY, 2023, 15 (01)
  • [4] Fine-Grained Feature Generation for Generalized Zero-Shot Video Classification
    Hong, Mingyao
    Zhang, Xinfeng
    Li, Guorong
    Huang, Qingming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1599 - 1612
  • [5] Incremental Zero-Shot Learning
    Wei, Kun
    Deng, Cheng
    Yang, Xu
    Tao, Dacheng
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13788 - 13799
  • [6] Spherical Zero-Shot Learning
    Shen, Jiayi
    Xiao, Zehao
    Zhen, Xiantong
    Zhang, Lei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 634 - 645
  • [7] Generative Mixup Networks for Zero-Shot Learning
    Xu, Bingrong
    Zeng, Zhigang
    Lian, Cheng
    Ding, Zhengming
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022,
  • [8] CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification
    Gao, Junyu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3088 - 3100
  • [9] Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview
    Ren, Wenqi
    Tang, Yang
    Sun, Qiyu
    Zhao, Chaoqiang
    Han, Qing-Long
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (05) : 1106 - 1126
  • [10] Multi-Modal Multi-Grained Embedding Learning for Generalized Zero-Shot Video Classification
    Hong, Mingyao
    Zhang, Xinfeng
    Li, Guorong
    Huang, Qingming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5959 - 5972