Point-Supervised Video Temporal Grounding

被引:9
|
作者
Xu, Zhe [1 ]
Wei, Kun [1 ]
Yang, Xu [1 ]
Deng, Cheng [1 ]
机构
[1] Xidian Univ, Sch Elect Engn, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal contrast; multi-level distribution calibration; point supervision; video temporal grounding; NETWORK;
D O I
10.1109/TMM.2022.3205404
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given an untrimmed video and a language query, Video Temporal Grounding (VTG) aims to locate the time interval in the video semantically relevant to the query. Existing fully-supervised VTG methods require accurate annotations of temporal boundary, which is time-consuming and expensive to obtain. On the other hand, weakly-supervised VTG methods where only paired videos and queries are available during training lag far behind the fully-supervised ones. In this paper, we introduce point supervision to narrow the performance gap with affordable annotating cost and propose a novel method dubbed Point-Supervised Video Temporal Grounding (PS-VTG). Specifically, an attention-based grounding network is first employed to obtain a language activation sequence (LAS). Then pseudo segment-level label is generated based on the LAS and the given point supervision to assist the training process. In addition, multi-level distribution calibration and cross-modal contrast are framed to obtain discriminative feature representations and precisely highlight the language-relevant video segments. Experiments on three benchmarks demonstrate that our method trained with point supervision can significantly outperform weakly-supervised approaches and achieve comparable performance with fully-supervised ones.
引用
收藏
页码:6121 / 6131
页数:11
相关论文
共 50 条
  • [1] Prototype contrastive learning for point-supervised temporal action detection
    Li, Ping
    Cao, Jiachen
    Ye, Xingchao
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [2] Point-supervised temporal action localisation based on multi-branch attention
    Liu, Shuai
    Zhang, Yang
    Srivastava, Gautam
    ENTERPRISE INFORMATION SYSTEMS, 2023, 17 (11)
  • [3] SQL-Net: Semantic Query Learning for Point-Supervised Temporal Action Localization
    Wang, Yu
    Zhao, Shengjie
    Chen, Shiwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 84 - 94
  • [4] Stepwise Multi-grained Boundary Detector for Point-Supervised Temporal Action Localization
    Liul, Mengnan
    Wang, Le
    Zhou, Sanping
    Xia, Kun
    Wu, Qi
    Zhang, Qilin
    Hua, Gang
    COMPUTER VISION-ECCV 2024, PT VII, 2025, 15065 : 333 - 349
  • [5] POTLoc: Pseudo-label Oriented Transformer for point-supervised temporal Action Localization
    Vahdani, Elahe
    Tian, Yingli
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 246
  • [6] HR-Pro: Point-Supervised Temporal Action Localization via Hierarchical Reliability Propagation
    Zhang, Huaxin
    Wang, Xiang
    Xu, Xiaohao
    Qing, Zhiwu
    Gao, Changxin
    Sang, Nong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7115 - 7123
  • [7] ProTeGe: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding
    Wang, Lan
    Mittal, Gaurav
    Sajeev, Sandra
    Yu, Ye
    Hall, Matthew
    Boddeti, Vishnu Naresh
    Chen, Mei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6575 - 6585
  • [8] Just a Hint: Point-Supervised Camouflaged Object Detection
    Chen, Huafeng
    Shao, Dian
    Guo, Guangqian
    Gao, Shan
    COMPUTER VISION-ECCV 2024, PT XXXV, 2025, 15093 : 332 - 348
  • [9] TextPolyp: Point-Supervised Polyp Segmentation with Text Cues
    Zhao, Yiming
    Zhou, Yi
    Zhang, Yizhe
    Wu, Ye
    Zhou, Tao
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI, 2024, 15011 : 711 - 722
  • [10] MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming
    Chen, Lin
    Zhang, Jing
    Zhang, Yian
    Kang, Junpeng
    Zhuo, Li
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 248