Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

被引：54

作者：

Yang, Wenfei ^{[1
]}

Zhang, Tianzhu ^{[1
]}

Zhang, Yongdong ^{[1
]}

Wu, Feng ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Informat Sci, Hefei 230027, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Grounding; Annotations; Two dimensional displays; Training; Feature extraction; Computational modeling; Task analysis; Weakly supervised; temporal sentence grounding;

D O I：

10.1109/TIP.2021.3058614

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most of existing methods cannot model the fine-grained video-text local correspondences well and do not have effective supervision information for correspondence learning, thus yielding unsatisfying performance. To address the above issues, we propose an end-to-end Local Correspondence Network (LCNet) for weakly supervised temporal sentence grounding. The proposed LCNet enjoys several merits. First, we represent video and text features in a hierarchical manner to model the fine-grained video-text correspondences. Second, we design a self-supervised cycle-consistent loss as a learning guidance for video and text matching. To the best of our knowledge, this is the first work to fully explore the fine-grained correspondences between video and text for temporal sentence grounding by using self-supervised learning. Extensive experimental results on two benchmark datasets demonstrate that the proposed LCNet significantly outperforms existing weakly supervised methods.

引用

页码：3252 / 3262

页数：11

共 50 条

[21] Rethinking Weakly-Supervised Video Temporal Grounding From a Game Perspective
Fang, Xiang
Xiong, Zeyu
Fang, Wanlong
Qu, Xiaoye
Chen, Chen
Dong, Jianfeng
Tang, Keke
Zhou, Pan
Cheng, Yu
Liu, Daizong
COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 290 - 311
[22] Weakly Supervised Correspondence Learning
Wang, Zihan
Cao, Zhangjie
Hao, Yilun
Sadigh, Dorsa
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
[23] Memory-Guided Semantic Learning Network for Temporal Sentence Grounding
Liu, Daizong
Qu, Xiaoye
Di, Xing
Cheng, Yu
Xu, Zichuan
Zhou, Pan
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1665 - 1673
[24] Context-aware biaffine localizing network for temporal sentence grounding
Liu, Daizong
Qu, Xiaoye
Dong, Jianfeng
Zhou, Pan
Cheng, Yu
Wei, Wei
Xu, Zichuan
Xie, Yulai
arXiv, 2021,
[25] Context-aware Biaffine Localizing Network for Temporal Sentence Grounding
Liu, Daizong
Qu, Xiaoye
Dong, Jianfeng
Zhou, Pan
Cheng, Yu
Wei, Wei
Xu, Zichuan
Xie, Yulai
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11230 - 11239
[26] A Survey on Temporal Sentence Grounding in Videos
Lan, Xiaohan
Yuan, Yitian
Wang, Xin
Wang, Zhi
Zhu, Wenwu
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
[27] Temporal Sentence Grounding in Streaming Videos
Gan, Tian
Wang, Xiao
Sun, Yan
Wu, Jianlong
Guo, Qingpei
Nie, Liqiang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4637 - 4646
[28] Weakly-Supervised Video Object Grounding by Exploring Spatio-Temporal Contexts
Yang, Xun
Liu, Xueliang
Jian, Meng
Gao, Xinjian
Wang, Meng
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1939 - 1947
[29] SaGCN: Semantic-Aware Graph Calibration Network for Temporal Sentence Grounding
Chen, Tongbao
Wang, Wenmin
Han, Kangrui
Xu, Huijuan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (06) : 3003 - 3016
[30] Conditional Video Diffusion Network for Fine-Grained Temporal Sentence Grounding
Liu, Daizong
Zhu, Jiahao
Fang, Xiang
Xiong, Zeyu
Wang, Huan
Li, Renfu
Zhou, Pan
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5461 - 5476

← 1 2 3 4 5 →