Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引：5

作者：

Lu, Yu ^{[1
]}

Quan, Ruijie ^{[2
]}

Zhu, Linchao ^{[2
]}

Yang, Yi ^{[2
]}

机构：

[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia

[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

澳大利亚研究理事会;

关键词：

Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;

D O I：

10.1109/TIP.2024.3365249

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.

引用

页码：1643 / 1654

页数：12

共 50 条

[1] Exploiting Prior Tacit Knowledge to Enhance Alignment and Verification in zero-shot video grounding
Wang, Jing
Zhao, Xianbing
Wang, Xiaojie
Feng, Fangxiang
NEUROCOMPUTING, 2025, 639
[2] Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
Jiang, Xun
Xu, Xing
Zhou, Zailei
Yang, Yang
Shen, Fumin
Shen, Heng Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9657 - 9670
[3] Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities
Wang, Ping
Sun, Li
Wang, Liuan
Sun, Jun
SUSTAINABILITY, 2023, 15 (01)
[4] Fine-Grained Feature Generation for Generalized Zero-Shot Video Classification
Hong, Mingyao
Zhang, Xinfeng
Li, Guorong
Huang, Qingming
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1599 - 1612
[5] Incremental Zero-Shot Learning
Wei, Kun
Deng, Cheng
Yang, Xu
Tao, Dacheng
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13788 - 13799
[6] Spherical Zero-Shot Learning
Shen, Jiayi
Xiao, Zehao
Zhen, Xiantong
Zhang, Lei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 634 - 645
[7] Generative Mixup Networks for Zero-Shot Learning
Xu, Bingrong
Zeng, Zhigang
Lian, Cheng
Ding, Zhengming
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022,
[8] CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification
Gao, Junyu
Xu, Changsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3088 - 3100
[9] Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview
Ren, Wenqi
Tang, Yang
Sun, Qiyu
Zhao, Chaoqiang
Han, Qing-Long
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (05) : 1106 - 1126
[10] Multi-Modal Multi-Grained Embedding Learning for Generalized Zero-Shot Video Classification
Hong, Mingyao
Zhang, Xinfeng
Li, Guorong
Huang, Qingming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5959 - 5972

← 1 2 3 4 5 →