Joint multi-scale information and long-range dependence for video captioning

被引:1
|
作者
Zhai, Zhongyi [1 ]
Chen, Xiaofeng [1 ]
Huang, Yishuang [1 ]
Zhao, Lingzhong [1 ]
Cheng, Bo [2 ]
He, Qian [1 ]
机构
[1] Guilin Univ Elect Technol, Guangxi Key Lab Trusted Software, Guilin 541004, Peoples R China
[2] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
基金
中国国家自然科学基金;
关键词
Video captioning; Multi-scale; Non-local; Long-range dependence;
D O I
10.1007/s13735-023-00303-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since deep learning methods have achieved great success in both computer vision and natural language processing, video captioning tasks based on these two fields have also attracted extensive attention. Video captioning is a challenging task, which aims to present video information in the form of natural language to enhance video intelligibility. Most of the current researches in video captioning focus on the behavioral description of the main objects of the video, especially on the holistic understanding of the content. This trend makes most video captioning efforts ignoring the characteristics of smaller objects in the video, resulting in ambiguous, imprecise, or even fundamentally wrong descriptions. In this paper, a novel video captioning method MSLR is proposed, which improves the accuracy of video description by extracting features of video objects with different granularity and preserving long-range temporal dependencies. Specifically, the proposed method performs convolution operations at different scales to obtain different granular spatial features of videos and then fuses them to generate a unified spatial representation. On this basis, a temporal extraction network is further constructed using non-local blocks to preserve the long-range dependencies of videos. Evaluated on two popular benchmark datasets, the experimental results demonstrate the superiority of MSLR over the previous state-of-the-art methods, and the effectiveness of MSLR components is verified through ablation experiments and text evaluation.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Joint multi-scale information and long-range dependence for video captioning
    Zhongyi Zhai
    Xiaofeng Chen
    Yishuang Huang
    Lingzhong Zhao
    Bo Cheng
    Qian He
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [2] Research of long-range dependence based on multi-scale wavelet analysis
    School of Automation, Huazhong University of Science and Technology, Wuhan
    430074, China
    不详
    430071, China
    不详
    430073, China
    Huazhong Ligong Daxue Xuebao, (486-488):
  • [3] Multi-scale features with temporal information guidance for video captioning
    Zhao, Hong
    Chen, Zhiwen
    Yang, Yi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [4] CAPTURING LONG-RANGE DEPENDENCIES IN VIDEO CAPTIONING
    Lee, Jaeyoung
    Lee, Yekang
    Seong, Sihyeon
    Kim, Kyungsu
    Kim, Sungjin
    Kim, Junmo
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1880 - 1884
  • [5] Long-range correlations in cryptocurrency markets: A multi-scale DFA approach
    Bui, Huy Quoc
    Schinckus, Christophe
    Al-Jaifi, Hamdan
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2025, 661
  • [6] LMIINet: long-range and multi-scale information interaction network for 3D object detection
    Mai, Chengfeng
    Wang, Haosen
    Wang, Cui
    Zhang, Bo
    Kodagoda, Sarath
    Wang, Shifeng
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
  • [7] Incorporating attentive multi-scale context information for image captioning
    Prudviraj, Jeripothula
    Sravani, Yenduri
    Mohan, C. Krishna
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 10017 - 10037
  • [8] Incorporating attentive multi-scale context information for image captioning
    Jeripothula Prudviraj
    Yenduri Sravani
    C. Krishna Mohan
    Multimedia Tools and Applications, 2023, 82 : 10017 - 10037
  • [9] Information-based long-range dependence
    Ilkka Norros
    Queueing Systems, 2022, 100 : 321 - 323
  • [10] Information-based long-range dependence
    Norros, Ilkka
    QUEUEING SYSTEMS, 2022, 100 (3-4) : 321 - 323