Dense video captioning using unsupervised semantic information

被引:0
|
作者
Estevam, Valter [1 ,2 ]
Laroca, Rayson [2 ,3 ]
Pedrini, Helio [4 ]
Menotti, David [2 ]
机构
[1] Fed Inst Parana, BR-84507302 Irati, PR, Brazil
[2] Univ Fed Parana, Dept Informat, BR-81531970 Curitiba, PR, Brazil
[3] Pontificia Univ Catolica Parana, Postgrad Program Informat, BR-80215901 Curitiba, PR, Brazil
[4] Univ Estadual Campinas, Inst Comp, BR-13083852 Campinas, SP, Brazil
关键词
Visual similarity; Unsupervised learning; Co-occurrence estimation; Self-attention; Bi-modal attention;
D O I
10.1016/j.jvcir.2024.104385
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a method to learn unsupervised semantic visual information based on the premise that complex events can be decomposed into simpler events and that these simple events are shared across several complex events. We first employ a clustering method to group representations producing a visual codebook. Then, we learn a dense representation by encoding the co-occurrence probability matrix for the codebook entries. This representation leverages the performance of the dense video captioning task in a scenario with only visual features. For example, we replace the audio signal in the BMT method and produce temporal proposals with comparable performance. Furthermore, we concatenate the visual representation with our descriptor in a vanilla transformer method to achieve state-of-the-art performance in the captioning subtask compared to the methods that explore only visual features, as well as a competitive performance with multi-modal methods. Our code is available at https://github.com/valterlej/dvcusi.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Unsupervised Image Registration for Video SAR
    Huang, Xuejun
    Ding, Jinshan
    Guo, Qinghua
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 1075 - 1083
  • [22] Unsupervised Domain Adaptation in Semantic Segmentation: A Review
    Toldo, Marco
    Maracani, Andrea
    Michieli, Umberto
    Zanuttigh, Pietro
    TECHNOLOGIES, 2020, 8 (02)
  • [23] Unsupervised learning by probabilistic latent semantic analysis
    Hofmann, T
    MACHINE LEARNING, 2001, 42 (1-2) : 177 - 196
  • [24] Deep Unsupervised Hashing with Selective Semantic Mining
    Zhao, Chuang
    Ling, Hefei
    Shi, Yuxuan
    Zhao, Chengxin
    Chen, Jiazhong
    Cao, Qiang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 96 - 101
  • [25] Unsupervised semantic label generation in agricultural fields
    Roggiolani, Gianmarco
    Rueckin, Julius
    Popovic, Marija
    Behley, Jens
    Stachniss, Cyrill
    FRONTIERS IN ROBOTICS AND AI, 2025, 12
  • [26] DEEP UNSUPERVISED HASHING WITH SEMANTIC CONSISTENCY LEARNING
    Zhao, Chuang
    Lu, Shijie
    Ling, Hefei
    Shi, Yuxuan
    Gu, Bo
    Li, Ping
    Cao, Qiang
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1380 - 1384
  • [27] Unsupervised Learning by Probabilistic Latent Semantic Analysis
    Thomas Hofmann
    Machine Learning, 2001, 42 : 177 - 196
  • [28] Unsupervised Domain Adaptation for Referring Semantic Segmentation
    Shi, Haonan
    Pan, Wenwen
    Zhao, Zhou
    Zhang, Mingmin
    Wu, Fei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5807 - 5818
  • [29] Rethinking unsupervised domain adaptation for semantic segmentation
    Wang, Zhijie
    Suganuma, Masanori
    Okatani, Takayuki
    PATTERN RECOGNITION LETTERS, 2024, 186 : 119 - 125
  • [30] Exploring complementary information of self-supervised pretext tasks for unsupervised video pre-training
    Zhou, Wei
    Hou, Yi
    Ouyang, Kewei
    Zhou, Shilin
    IET COMPUTER VISION, 2022, 16 (03) : 255 - 265