Unsupervised Temporal Video Grounding with Deep Semantic Clustering

被引:0
|
作者
Liu, Daizong [1 ,2 ]
Qu, Xiaoye [2 ]
Wang, Yinzhen [3 ]
Di, Xing [4 ]
Zou, Kai [4 ]
Cheng, Yu [5 ]
Xu, Zichuan [6 ]
Zhou, Pan [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan, Hubei, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan, Hubei, Peoples R China
[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Hubei, Peoples R China
[4] ProtagoLabs Inc, Vienna, Austria
[5] Microsoft Res, Redmond, WA USA
[6] Dalian Univ Technol, Dalian, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query. Though respectable works have made decent achievements in this task, they severely rely on abundant video-query paired data, which is expensive and time-consuming to collect in real-world scenarios. In this paper, we explore whether a video grounding model can be learned without any paired annotations. To the best of our knowledge, this paper is the first work trying to address TVG in an unsupervised setting. Considering there is no paired supervision, we propose a novel Deep Semantic Clustering Network (DSCNet) to leverage all semantic information from the whole query set to compose the possible activity in each video for grounding. Specifically, we first develop a language semantic mining module, which extracts implicit semantic features from the whole query set. Then, these language semantic features serve as the guidance to compose the activity in video via a video-based semantic aggregation module. Finally, we utilize a foreground attention branch to filter out the redundant background activities and refine the grounding results. To validate the effectiveness of our DSCNet, we conduct experiments on both ActivityNet Captions and Charades-STA datasets. The results demonstrate that DSCNet achieves competitive performance, and even outperforms most weakly-supervised approaches.
引用
收藏
页码:1683 / 1691
页数:9
相关论文
共 50 条
  • [21] Modular Action Concept Grounding in Semantic Video Prediction
    Yu, Wei
    Chen, Wenxin
    Yin, Songheng
    Easterbrook, Steve
    Garg, Animesh
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3595 - 3604
  • [22] Point-Supervised Video Temporal Grounding
    Xu, Zhe
    Wei, Kun
    Yang, Xu
    Deng, Cheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6121 - 6131
  • [23] Unsupervised Grounding of Textual Descriptions of Object Features and Actions in Video
    Alomari, Muhannad
    Chinellato, Eris
    Gatsoulis, Yiannis
    Hogg, David C.
    Cohn, Anthony G.
    FIFTEENTH INTERNATIONAL CONFERENCE ON THE PRINCIPLES OF KNOWLEDGE REPRESENTATION AND REASONING, 2016, : 505 - 508
  • [24] Performance boosting of conventional deep learning-based semantic segmentation leveraging unsupervised clustering
    Ma, Jong Won
    Leite, Fernanda
    AUTOMATION IN CONSTRUCTION, 2022, 136
  • [25] DCT-net: A deep co-interactive transformer network for video temporal grounding
    Wang, Wen
    Cheng, Jian
    Liu, Siyu
    IMAGE AND VISION COMPUTING, 2021, 110
  • [26] Unsupervised Deep Clustering for Fashion Images
    Yan, Cairong
    Malhi, Umar Subhan
    Huang, Yongfeng
    Tao, Ran
    KNOWLEDGE MANAGEMENT IN ORGANIZATIONS, KMO 2019, 2019, 1027 : 85 - 96
  • [27] SDN: Semantic Decoupling Network for Temporal Language Grounding
    Jiang, Xun
    Xu, Xing
    Zhang, Jingran
    Shen, Fumin
    Cao, Zuo
    Shen, Heng Tao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 6598 - 6612
  • [28] Unsupervised Deep Embedding for Fuzzy Clustering
    Zhang, Runxin
    Duan, Yu
    Nie, Feiping
    Wang, Rong
    Li, Xuelong
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (12) : 6744 - 6753
  • [29] Learning Unsupervised Visual Grounding Through Semantic Self-Supervision
    Javed, Syed Ashar
    Saxena, Shreyas
    Gandhi, Vineet
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 796 - 802
  • [30] Unsupervised Deep Learning for Subspace Clustering
    Sekmen, Ali
    Koku, Ahmet Bugra
    Parlaktuna, Mustafa
    Abdul-Malek, Ayad
    Vanamala, Nagendrababu
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2089 - 2094