Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引：5

作者：

Lu, Yu ^{[1
]}

Quan, Ruijie ^{[2
]}

Zhu, Linchao ^{[2
]}

Yang, Yi ^{[2
]}

机构：

[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia

[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

澳大利亚研究理事会;

关键词：

Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;

D O I：

10.1109/TIP.2024.3365249

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.

引用

页码：1643 / 1654

页数：12

共 50 条

[31] Domain Shift Preservation for Zero-Shot Domain Adaptation
Wang, Jinghua
Cheng, Ming-Ming
Jiang, Jianmin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5505 - 5517
[32] Diversity-Boosted Generalization-Specialization Balancing for Zero-Shot Learning
Li, Yun
Liu, Zhe
Chang, Xiaojun
McAuley, Julian
Yao, Lina
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8372 - 8382
[33] Zero-Shot Modulation Recognition via Knowledge-Informed Waveform Description
Chen, Ying
Wang, Xiang
Huang, Zhitao
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 21 - 25
[34] MFHI: Taking Modality-Free Human Identification as Zero-Shot Learning
Liu, Zhizhe
Zhang, Xingxing
Zhu, Zhenfeng
Zheng, Shuai
Zhao, Yao
Cheng, Jian
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5225 - 5237
[35] E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
Bao, Peijun
Shao, Zihao
Yang, Wenhan
Ng, Boon Poh
Kot, Alex C.
COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 227 - 243
[36] Zero-Shot Learning via Discriminative Dual Semantic Auto-Encoder
Xing, Nan
Liu, Yang
Zhu, Hong
Wang, Jing
Han, Jungong
IEEE ACCESS, 2021, 9 : 733 - 742
[37] Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
Li, Wenrui
Wang, Penghong
Xiong, Ruiqin
Fan, Xiaopeng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4840 - 4852
[38] Dual-verification network for zero-shot learning
Zhang, Haofeng
Long, Yang
Yang, Wankou
Shao, Ling
INFORMATION SCIENCES, 2019, 470 : 43 - 57
[39] Grounding Visual Concepts for Zero-Shot Event Detection and Event Captioning
Li, Zhihui
Chang, Xiaojun
Yao, Lina
Pan, Shirui
Ge Zongyuan
Zhang, Huaxiang
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 297 - 305
[40] Bidirectional Mapping Coupled GAN for Generalized Zero-Shot Learning
Shermin, Tasfia
Teng, Shyh Wei
Sohel, Ferdous
Murshed, Manzur
Lu, Guojun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 721 - 733

← 1 2 3 4 5 →