Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

被引:54
|
作者
Yang, Wenfei [1 ]
Zhang, Tianzhu [1 ]
Zhang, Yongdong [1 ]
Wu, Feng [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci, Hefei 230027, Peoples R China
基金
中国国家自然科学基金;
关键词
Grounding; Annotations; Two dimensional displays; Training; Feature extraction; Computational modeling; Task analysis; Weakly supervised; temporal sentence grounding;
D O I
10.1109/TIP.2021.3058614
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most of existing methods cannot model the fine-grained video-text local correspondences well and do not have effective supervision information for correspondence learning, thus yielding unsatisfying performance. To address the above issues, we propose an end-to-end Local Correspondence Network (LCNet) for weakly supervised temporal sentence grounding. The proposed LCNet enjoys several merits. First, we represent video and text features in a hierarchical manner to model the fine-grained video-text correspondences. Second, we design a self-supervised cycle-consistent loss as a learning guidance for video and text matching. To the best of our knowledge, this is the first work to fully explore the fine-grained correspondences between video and text for temporal sentence grounding by using self-supervised learning. Extensive experimental results on two benchmark datasets demonstrate that the proposed LCNet significantly outperforms existing weakly supervised methods.
引用
收藏
页码:3252 / 3262
页数:11
相关论文
共 50 条
  • [21] Rethinking Weakly-Supervised Video Temporal Grounding From a Game Perspective
    Fang, Xiang
    Xiong, Zeyu
    Fang, Wanlong
    Qu, Xiaoye
    Chen, Chen
    Dong, Jianfeng
    Tang, Keke
    Zhou, Pan
    Cheng, Yu
    Liu, Daizong
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 290 - 311
  • [22] Weakly Supervised Correspondence Learning
    Wang, Zihan
    Cao, Zhangjie
    Hao, Yilun
    Sadigh, Dorsa
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
  • [23] Memory-Guided Semantic Learning Network for Temporal Sentence Grounding
    Liu, Daizong
    Qu, Xiaoye
    Di, Xing
    Cheng, Yu
    Xu, Zichuan
    Zhou, Pan
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1665 - 1673
  • [24] Context-aware biaffine localizing network for temporal sentence grounding
    Liu, Daizong
    Qu, Xiaoye
    Dong, Jianfeng
    Zhou, Pan
    Cheng, Yu
    Wei, Wei
    Xu, Zichuan
    Xie, Yulai
    arXiv, 2021,
  • [25] Context-aware Biaffine Localizing Network for Temporal Sentence Grounding
    Liu, Daizong
    Qu, Xiaoye
    Dong, Jianfeng
    Zhou, Pan
    Cheng, Yu
    Wei, Wei
    Xu, Zichuan
    Xie, Yulai
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11230 - 11239
  • [26] A Survey on Temporal Sentence Grounding in Videos
    Lan, Xiaohan
    Yuan, Yitian
    Wang, Xin
    Wang, Zhi
    Zhu, Wenwu
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [27] Temporal Sentence Grounding in Streaming Videos
    Gan, Tian
    Wang, Xiao
    Sun, Yan
    Wu, Jianlong
    Guo, Qingpei
    Nie, Liqiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4637 - 4646
  • [28] Weakly-Supervised Video Object Grounding by Exploring Spatio-Temporal Contexts
    Yang, Xun
    Liu, Xueliang
    Jian, Meng
    Gao, Xinjian
    Wang, Meng
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1939 - 1947
  • [29] SaGCN: Semantic-Aware Graph Calibration Network for Temporal Sentence Grounding
    Chen, Tongbao
    Wang, Wenmin
    Han, Kangrui
    Xu, Huijuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (06) : 3003 - 3016
  • [30] Conditional Video Diffusion Network for Fine-Grained Temporal Sentence Grounding
    Liu, Daizong
    Zhu, Jiahao
    Fang, Xiang
    Xiong, Zeyu
    Wang, Huan
    Li, Renfu
    Zhou, Pan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5461 - 5476