Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引：5

作者：

Lu, Yu ^{[1
]}

Quan, Ruijie ^{[2
]}

Zhu, Linchao ^{[2
]}

Yang, Yi ^{[2
]}

机构：

[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia

[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

澳大利亚研究理事会;

关键词：

Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;

D O I：

10.1109/TIP.2024.3365249

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.

引用

页码：1643 / 1654

页数：12

共 50 条

[21] A Deep Multi-Modal Explanation Model for Zero-Shot Learning
Liu, Yu
Tuytelaars, Tinne
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 4788 - 4803
[22] Semantics-Guided Contrastive Network for Zero-Shot Object Detection
Yan, Caixia
Chang, Xiaojun
Luo, Minnan
Liu, Huan
Zhang, Xiaoqin
Zheng, Qinghua
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1530 - 1544
[23] Transferring Knowledge From Text to Video: Zero-Shot Anticipation for Procedural Actions
Sener, Fadime
Saraf, Rishabh
Yao, Angela
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7836 - 7852
[24] Learning MLatent Representations for Generalized Zero-Shot Learning
Ye, Yalan
Pan, Tongjie
Luo, Tonghoujun
Li, Jingjing
Shen, Heng Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2252 - 2265
[25] Investigating the Bilateral Connections in Generative Zero-Shot Learning
Li, Jingjing
Jing, Mengmeng
Lu, Ke
Zhu, Lei
Shen, Heng Tao
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 8167 - 8178
[26] On Implicit Attribute Localization for Generalized Zero-Shot Learning
Yang, Shiqi
Wang, Kai
Herranz, Luis
van de Weijer, Joost
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 872 - 876
[27] Transductive Zero-Shot Hashing for Multilabel Image Retrieval
Zou, Qin
Cao, Ling
Zhang, Zheng
Chen, Long
Wang, Song
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1673 - 1687
[28] Deep Unbiased Embedding Transfer for Zero-Shot Learning
Jia, Zhen
Zhang, Zhang
Wang, Liang
Shan, Caifeng
Tan, Tieniu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1958 - 1971
[29] Adaptive Fusion Learning for Compositional Zero-Shot Recognition
Min, Lingtong
Fan, Ziman
Wang, Shunzhou
Dou, Feiyang
Li, Xin
Wang, Binglu
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1193 - 1204
[30] Towards Effective Deep Embedding for Zero-Shot Learning
Zhang, Lei
Wang, Peng
Liu, Lingqiao
Shen, Chunhua
Wei, Wei
Zhang, Yanning
van den Hengel, Anton
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 2843 - 2852

← 1 2 3 4 5 →