Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

被引:54
|
作者
Yang, Wenfei [1 ]
Zhang, Tianzhu [1 ]
Zhang, Yongdong [1 ]
Wu, Feng [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci, Hefei 230027, Peoples R China
基金
中国国家自然科学基金;
关键词
Grounding; Annotations; Two dimensional displays; Training; Feature extraction; Computational modeling; Task analysis; Weakly supervised; temporal sentence grounding;
D O I
10.1109/TIP.2021.3058614
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most of existing methods cannot model the fine-grained video-text local correspondences well and do not have effective supervision information for correspondence learning, thus yielding unsatisfying performance. To address the above issues, we propose an end-to-end Local Correspondence Network (LCNet) for weakly supervised temporal sentence grounding. The proposed LCNet enjoys several merits. First, we represent video and text features in a hierarchical manner to model the fine-grained video-text correspondences. Second, we design a self-supervised cycle-consistent loss as a learning guidance for video and text matching. To the best of our knowledge, this is the first work to fully explore the fine-grained correspondences between video and text for temporal sentence grounding by using self-supervised learning. Extensive experimental results on two benchmark datasets demonstrate that the proposed LCNet significantly outperforms existing weakly supervised methods.
引用
收藏
页码:3252 / 3262
页数:11
相关论文
共 50 条
  • [41] Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
    Liu, Xuejing
    Li, Liang
    Wang, Shuhui
    Zha, Zheng-Jun
    Su, Li
    Huang, Qingming
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 539 - 547
  • [42] MARN: Multi-level Attentional Reconstruction Networks for Weakly Supervised Video Temporal Grounding
    Song, Yijun
    Wang, Jingwen
    Ma, Lin
    Yu, Jun
    Liang, Jinxiu
    Yuan, Liu
    Yu, Zhou
    NEUROCOMPUTING, 2023, 554
  • [43] WINNER: Weakly-supervised hIerarchical decompositioN and aligNment for spatio-tEmporal video gRounding
    Li, Mengze
    Wang, Han
    Zhang, Wengiao
    Miao, Jiaxu
    Zhao, Zhou
    Zhang, Shengyu
    Ji, Wei
    Wu, Fei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23090 - 23099
  • [44] GLNet: Global Local Network for Weakly Supervised Action Localization
    Zhang, Shiwei
    Song, Lin
    Gao, Changxin
    Sang, Nong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) : 2610 - 2622
  • [45] GLNet: Global Local Network for Weakly Supervised Action Localization
    Zhang, Shiwei
    Song, Lin
    Gao, Changxin
    Sang, Nong
    Sang, Nong (nsang@hust.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc., United States (22): : 2610 - 2622
  • [46] Temporal-enhanced Cross-modality Fusion Network for Video Sentence Grounding
    Lv, Zezhong
    Su, Bing
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1487 - 1492
  • [47] Parameterized multi-perspective graph learning network for temporal sentence grounding in videos
    Wu, Guangli
    Yang, Zhijun
    Zhang, Jing
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8184 - 8199
  • [48] Detector-Free Weakly Supervised Grounding by Separation
    Arbelle, Assaf
    Doveh, Sivan
    Alfassy, Amit
    Shtok, Joseph
    Lev, Guy
    Schwartz, Eli
    Kuehne, Hilde
    Levi, Hila Barak
    Sattigeri, Prasanna
    Panda, Rameswar
    Chen, Chun-Fu
    Bronstein, Alex
    Saenko, Kate
    Ullman, Shimon
    Giryes, Raja
    Feris, Rogerio
    Karlinsky, Leonid
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1781 - 1792
  • [49] Weakly Supervised Multimodal Affordance Grounding for Egocentric Images
    Xu, Lingjing
    Gao, Yang
    Song, Wenfeng
    Hao, Aimin
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6324 - 6332
  • [50] Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
    Chen, Kan
    Gao, Jiyang
    Nevatia, Ram
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4042 - 4050