Dense video captioning using unsupervised semantic information

被引:0
|
作者
Estevam, Valter [1 ,2 ]
Laroca, Rayson [2 ,3 ]
Pedrini, Helio [4 ]
Menotti, David [2 ]
机构
[1] Fed Inst Parana, BR-84507302 Irati, PR, Brazil
[2] Univ Fed Parana, Dept Informat, BR-81531970 Curitiba, PR, Brazil
[3] Pontificia Univ Catolica Parana, Postgrad Program Informat, BR-80215901 Curitiba, PR, Brazil
[4] Univ Estadual Campinas, Inst Comp, BR-13083852 Campinas, SP, Brazil
关键词
Visual similarity; Unsupervised learning; Co-occurrence estimation; Self-attention; Bi-modal attention;
D O I
10.1016/j.jvcir.2024.104385
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a method to learn unsupervised semantic visual information based on the premise that complex events can be decomposed into simpler events and that these simple events are shared across several complex events. We first employ a clustering method to group representations producing a visual codebook. Then, we learn a dense representation by encoding the co-occurrence probability matrix for the codebook entries. This representation leverages the performance of the dense video captioning task in a scenario with only visual features. For example, we replace the audio signal in the BMT method and produce temporal proposals with comparable performance. Furthermore, we concatenate the visual representation with our descriptor in a vanilla transformer method to achieve state-of-the-art performance in the captioning subtask compared to the methods that explore only visual features, as well as a competitive performance with multi-modal methods. Our code is available at https://github.com/valterlej/dvcusi.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Consistency Regularization for Unsupervised Domain Adaptation in Semantic Segmentation
    Scherer, Sebastian
    Brehm, Stephan
    Lienhart, Rainer
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 500 - 511
  • [42] Information fusion for unsupervised image segmentation using stochastic watershed and Hessian matrix
    Chahine, Chaza
    Vachier-Lagorre, Corinne
    Chenoune, Yasmina
    El Berbari, Racha
    El Fawal, Ziad
    Petit, Eric
    IET IMAGE PROCESSING, 2018, 12 (04) : 525 - 531
  • [43] Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets
    Habeb, Mohamed H.
    Salama, May
    Elrefaei, Lamiaa A.
    ALGORITHMS, 2024, 17 (07)
  • [44] Unsupervised Monocular Depth Estimation Based on Dense Feature Fusion
    Chen Ying
    Wang Yiliang
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (10) : 2976 - 2984
  • [45] Unsupervised video-based lane detection using location-enhanced topic models
    Sun, Hao
    Wang, Cheng
    Wang, Boliang
    El-Sheimy, Naser
    OPTICAL ENGINEERING, 2010, 49 (10)
  • [46] A performing analysis of unsupervised dense matching feature extraction networks
    Jin F.
    Guan K.
    Liu Z.
    Han J.
    Rui J.
    Li Q.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2022, 51 (03): : 426 - 436
  • [47] An Unsupervised Sentiment Information Identification Approach
    Xu, Panpan
    Jin, Huilan
    Shi, Hanxiao
    Chen, Wei
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 3330 - +
  • [48] UDF-GAN: Unsupervised dense optical-flow estimation using cycle Generative Adversarial Networks
    Liu, Xiaochen
    Zhang, Tao
    Liu, Mingming
    KNOWLEDGE-BASED SYSTEMS, 2023, 271
  • [49] Towards Personalized Video Summarization using Synchronized Comments and Probabilistic Latent Semantic Analysis
    Chung, Cheng-Tao
    Hsiung, Hsin-Kuan
    Wei, Cheng-Kuang
    Lee, Lin-shan
    2014 IEEE 3RD GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE), 2014, : 414 - 415
  • [50] Joint Attention Mechanism for Unsupervised Video Object Segmentation
    Yao, Rui
    Xu, Xin
    Zhou, Yong
    Zhao, Jiaqi
    Fang, Liang
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 154 - 165