Point-Supervised Video Temporal Grounding

被引：9

作者：

Xu, Zhe ^{[1
]}

Wei, Kun ^{[1
]}

Yang, Xu ^{[1
]}

Deng, Cheng ^{[1
]}

机构：

[1] Xidian Univ, Sch Elect Engn, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal contrast; multi-level distribution calibration; point supervision; video temporal grounding; NETWORK;

D O I：

10.1109/TMM.2022.3205404

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Given an untrimmed video and a language query, Video Temporal Grounding (VTG) aims to locate the time interval in the video semantically relevant to the query. Existing fully-supervised VTG methods require accurate annotations of temporal boundary, which is time-consuming and expensive to obtain. On the other hand, weakly-supervised VTG methods where only paired videos and queries are available during training lag far behind the fully-supervised ones. In this paper, we introduce point supervision to narrow the performance gap with affordable annotating cost and propose a novel method dubbed Point-Supervised Video Temporal Grounding (PS-VTG). Specifically, an attention-based grounding network is first employed to obtain a language activation sequence (LAS). Then pseudo segment-level label is generated based on the LAS and the given point supervision to assist the training process. In addition, multi-level distribution calibration and cross-modal contrast are framed to obtain discriminative feature representations and precisely highlight the language-relevant video segments. Experiments on three benchmarks demonstrate that our method trained with point supervision can significantly outperform weakly-supervised approaches and achieve comparable performance with fully-supervised ones.

引用

页码：6121 / 6131

页数：11

共 50 条

[1] Prototype contrastive learning for point-supervised temporal action detection
Li, Ping
Cao, Jiachen
Ye, Xingchao
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
[2] Point-supervised temporal action localisation based on multi-branch attention
Liu, Shuai
Zhang, Yang
Srivastava, Gautam
ENTERPRISE INFORMATION SYSTEMS, 2023, 17 (11)
[3] SQL-Net: Semantic Query Learning for Point-Supervised Temporal Action Localization
Wang, Yu
Zhao, Shengjie
Chen, Shiwei
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 84 - 94
[4] Stepwise Multi-grained Boundary Detector for Point-Supervised Temporal Action Localization
Liul, Mengnan
Wang, Le
Zhou, Sanping
Xia, Kun
Wu, Qi
Zhang, Qilin
Hua, Gang
COMPUTER VISION-ECCV 2024, PT VII, 2025, 15065 : 333 - 349
[5] POTLoc: Pseudo-label Oriented Transformer for point-supervised temporal Action Localization
Vahdani, Elahe
Tian, Yingli
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 246
[6] HR-Pro: Point-Supervised Temporal Action Localization via Hierarchical Reliability Propagation
Zhang, Huaxin
Wang, Xiang
Xu, Xiaohao
Qing, Zhiwu
Gao, Changxin
Sang, Nong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7115 - 7123
[7] ProTeGe: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding
Wang, Lan
Mittal, Gaurav
Sajeev, Sandra
Yu, Ye
Hall, Matthew
Boddeti, Vishnu Naresh
Chen, Mei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6575 - 6585
[8] Just a Hint: Point-Supervised Camouflaged Object Detection
Chen, Huafeng
Shao, Dian
Guo, Guangqian
Gao, Shan
COMPUTER VISION-ECCV 2024, PT XXXV, 2025, 15093 : 332 - 348
[9] TextPolyp: Point-Supervised Polyp Segmentation with Text Cues
Zhao, Yiming
Zhou, Yi
Zhang, Yizhe
Wu, Ye
Zhou, Tao
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI, 2024, 15011 : 711 - 722
[10] MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming
Chen, Lin
Zhang, Jing
Zhang, Yian
Kang, Junpeng
Zhuo, Li
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 248

← 1 2 3 4 5 →