Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

被引：54

作者：

Yang, Wenfei ^{[1
]}

Zhang, Tianzhu ^{[1
]}

Zhang, Yongdong ^{[1
]}

Wu, Feng ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Informat Sci, Hefei 230027, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Grounding; Annotations; Two dimensional displays; Training; Feature extraction; Computational modeling; Task analysis; Weakly supervised; temporal sentence grounding;

D O I：

10.1109/TIP.2021.3058614

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most of existing methods cannot model the fine-grained video-text local correspondences well and do not have effective supervision information for correspondence learning, thus yielding unsatisfying performance. To address the above issues, we propose an end-to-end Local Correspondence Network (LCNet) for weakly supervised temporal sentence grounding. The proposed LCNet enjoys several merits. First, we represent video and text features in a hierarchical manner to model the fine-grained video-text correspondences. Second, we design a self-supervised cycle-consistent loss as a learning guidance for video and text matching. To the best of our knowledge, this is the first work to fully explore the fine-grained correspondences between video and text for temporal sentence grounding by using self-supervised learning. Extensive experimental results on two benchmark datasets demonstrate that the proposed LCNet significantly outperforms existing weakly supervised methods.

引用

页码：3252 / 3262

页数：11

共 50 条

[41] Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
Liu, Xuejing
Li, Liang
Wang, Shuhui
Zha, Zheng-Jun
Su, Li
Huang, Qingming
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 539 - 547
[42] MARN: Multi-level Attentional Reconstruction Networks for Weakly Supervised Video Temporal Grounding
Song, Yijun
Wang, Jingwen
Ma, Lin
Yu, Jun
Liang, Jinxiu
Yuan, Liu
Yu, Zhou
NEUROCOMPUTING, 2023, 554
[43] WINNER: Weakly-supervised hIerarchical decompositioN and aligNment for spatio-tEmporal video gRounding
Li, Mengze
Wang, Han
Zhang, Wengiao
Miao, Jiaxu
Zhao, Zhou
Zhang, Shengyu
Ji, Wei
Wu, Fei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23090 - 23099
[44] GLNet: Global Local Network for Weakly Supervised Action Localization
Zhang, Shiwei
Song, Lin
Gao, Changxin
Sang, Nong
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) : 2610 - 2622
[45] GLNet: Global Local Network for Weakly Supervised Action Localization
Zhang, Shiwei
Song, Lin
Gao, Changxin
Sang, Nong
Sang, Nong (nsang@hust.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc., United States (22): : 2610 - 2622
[46] Temporal-enhanced Cross-modality Fusion Network for Video Sentence Grounding
Lv, Zezhong
Su, Bing
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1487 - 1492
[47] Parameterized multi-perspective graph learning network for temporal sentence grounding in videos
Wu, Guangli
Yang, Zhijun
Zhang, Jing
APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8184 - 8199
[48] Detector-Free Weakly Supervised Grounding by Separation
Arbelle, Assaf
Doveh, Sivan
Alfassy, Amit
Shtok, Joseph
Lev, Guy
Schwartz, Eli
Kuehne, Hilde
Levi, Hila Barak
Sattigeri, Prasanna
Panda, Rameswar
Chen, Chun-Fu
Bronstein, Alex
Saenko, Kate
Ullman, Shimon
Giryes, Raja
Feris, Rogerio
Karlinsky, Leonid
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1781 - 1792
[49] Weakly Supervised Multimodal Affordance Grounding for Egocentric Images
Xu, Lingjing
Gao, Yang
Song, Wenfeng
Hao, Aimin
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6324 - 6332
[50] Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
Chen, Kan
Gao, Jiyang
Nevatia, Ram
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4042 - 4050

← 1 2 3 4 5 →