EDS: Exploring deeper into semantics for video captioning

被引:0
|
作者
Lou, Yibo [1 ]
Zhang, Wenjie [1 ]
Song, Xiaoning [1 ,2 ]
Hua, Yang [1 ]
Wu, Xiao-Jun [1 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Peoples R China
[2] DiTu Suzhou Biotechnol Co Ltd, Suzhou 215000, Peoples R China
基金
中国国家自然科学基金;
关键词
Video captioning; Text generation; Semantic information; GENERATION;
D O I
10.1016/j.patrec.2024.09.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficiently leveraging semantic information is crucial for advancing video captioning in recent years. But, prevailing approaches that involve designing various Part-of-Speech (POS) tags as prior information lack essential linguistic knowledge guidance throughout the training procedure, particularly in the context of POS and initial description generation. Furthermore, the restriction to a single source of semantic information ignores the potential for varied interpretations inherent in each video. To solve these problems, we propose the Exploring Deeper into Semantics (EDS) method for video captioning. EDS comprises three feasible modules that focus on semantic information. Specifically, we propose the Semantic Supervised Generation (SSG) module. It integrates semantic information as a prior, and facilitates enriched interrelations among words for POS supervision. A novel Similarity Semantic Extension (SSE) module is proposed to employ a query-based semantic expansion for collaboratively generating fine-grained content. Additionally, the proposed Input Semantic Enhancement (ISE) module provides a strategy for mitigating the information constraints faced during the initial phase of word generation. The experiments conducted show that, by exploiting semantic information through supervision, extension, and enhancement, EDS not only yields promising results but also underlines the effectiveness. Code will be available at https://github.com/BradenJoson/EDS.
引用
收藏
页码:133 / 140
页数:8
相关论文
共 50 条
  • [1] Delving Deeper into the Decoder for Video Captioning
    Chen, Haoran
    Li, Jianmin
    Hu, Xiaolin
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1079 - 1086
  • [2] Brain-inspired learning to deeper inductive reasoning for video captioning
    Xiao Yao
    Feiyang Xu
    Min Gu
    Peipei Wang
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3979 - 3991
  • [3] Brain-inspired learning to deeper inductive reasoning for video captioning
    Yao, Xiao
    Xu, Feiyang
    Gu, Min
    Wang, Peipei
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3979 - 3991
  • [4] Domain-Specific Semantics Guided Approach to Video Captioning
    Hemalatha, M.
    Sekhar, C. Chandra
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1576 - 1585
  • [5] Rich Visual and Language Representation with Complementary Semantics for Video Captioning
    Tang, Pengjie
    Wang, Hanli
    Li, Qinyu
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (02)
  • [6] Exploring Group Video Captioning with Efficient Relational Approximation
    Lin, Wang
    Jin, Tao
    Wang, Ye
    Pan, Wenwen
    Li, Linjun
    Cheng, Xize
    Zhao, Zhou
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15235 - 15244
  • [7] Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag
    Xiao, Xinyu
    Wang, Lingfeng
    Fan, Bin
    Xiang, Shiming
    Pan, Chunhong
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2068 - 2077
  • [8] A Semantics-Assisted Video Captioning Model Trained With Scheduled Sampling
    Chen, Haoran
    Lin, Ke
    Maye, Alexander
    Li, Jianmin
    Hu, Xiaolin
    FRONTIERS IN ROBOTICS AND AI, 2020, 7
  • [9] Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization
    Aakur, Sathyanarayanan
    de Souza, Fillipe D. M.
    Sarkar, Sudeep
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 190 - 199
  • [10] Exploring the Spatio-Temporal Aware Graph for video captioning
    Xue, Ping
    Zhou, Bing
    IET COMPUTER VISION, 2022, 16 (05) : 456 - 467