共 50 条
Zero-Shot Video Grounding With Pseudo Query Lookup and Verification
被引:5
|作者:
Lu, Yu
[1
]
Quan, Ruijie
[2
]
Zhu, Linchao
[2
]
Yang, Yi
[2
]
机构:
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia
[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China
基金:
澳大利亚研究理事会;
关键词:
Grounding;
Detectors;
Proposals;
Training;
Task analysis;
Visualization;
Semantics;
Video grounding;
zero-shot learning;
vision and language;
NETWORK;
LOCALIZATION;
D O I:
10.1109/TIP.2024.3365249
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.
引用
收藏
页码:1643 / 1654
页数:12
相关论文