Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引:5
|
作者
Lu, Yu [1 ]
Quan, Ruijie [2 ]
Zhu, Linchao [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia
[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China
基金
澳大利亚研究理事会;
关键词
Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;
D O I
10.1109/TIP.2024.3365249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.
引用
收藏
页码:1643 / 1654
页数:12
相关论文
共 50 条
  • [21] A Deep Multi-Modal Explanation Model for Zero-Shot Learning
    Liu, Yu
    Tuytelaars, Tinne
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 4788 - 4803
  • [22] Semantics-Guided Contrastive Network for Zero-Shot Object Detection
    Yan, Caixia
    Chang, Xiaojun
    Luo, Minnan
    Liu, Huan
    Zhang, Xiaoqin
    Zheng, Qinghua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1530 - 1544
  • [23] Transferring Knowledge From Text to Video: Zero-Shot Anticipation for Procedural Actions
    Sener, Fadime
    Saraf, Rishabh
    Yao, Angela
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7836 - 7852
  • [24] Learning MLatent Representations for Generalized Zero-Shot Learning
    Ye, Yalan
    Pan, Tongjie
    Luo, Tonghoujun
    Li, Jingjing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2252 - 2265
  • [25] Investigating the Bilateral Connections in Generative Zero-Shot Learning
    Li, Jingjing
    Jing, Mengmeng
    Lu, Ke
    Zhu, Lei
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 8167 - 8178
  • [26] On Implicit Attribute Localization for Generalized Zero-Shot Learning
    Yang, Shiqi
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 872 - 876
  • [27] Transductive Zero-Shot Hashing for Multilabel Image Retrieval
    Zou, Qin
    Cao, Ling
    Zhang, Zheng
    Chen, Long
    Wang, Song
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1673 - 1687
  • [28] Deep Unbiased Embedding Transfer for Zero-Shot Learning
    Jia, Zhen
    Zhang, Zhang
    Wang, Liang
    Shan, Caifeng
    Tan, Tieniu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1958 - 1971
  • [29] Adaptive Fusion Learning for Compositional Zero-Shot Recognition
    Min, Lingtong
    Fan, Ziman
    Wang, Shunzhou
    Dou, Feiyang
    Li, Xin
    Wang, Binglu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1193 - 1204
  • [30] Towards Effective Deep Embedding for Zero-Shot Learning
    Zhang, Lei
    Wang, Peng
    Liu, Lingqiao
    Shen, Chunhua
    Wei, Wei
    Zhang, Yanning
    van den Hengel, Anton
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 2843 - 2852