EDS: Exploring deeper into semantics for video captioning

被引:0
|
作者
Lou, Yibo [1 ]
Zhang, Wenjie [1 ]
Song, Xiaoning [1 ,2 ]
Hua, Yang [1 ]
Wu, Xiao-Jun [1 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Peoples R China
[2] DiTu Suzhou Biotechnol Co Ltd, Suzhou 215000, Peoples R China
基金
中国国家自然科学基金;
关键词
Video captioning; Text generation; Semantic information; GENERATION;
D O I
10.1016/j.patrec.2024.09.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficiently leveraging semantic information is crucial for advancing video captioning in recent years. But, prevailing approaches that involve designing various Part-of-Speech (POS) tags as prior information lack essential linguistic knowledge guidance throughout the training procedure, particularly in the context of POS and initial description generation. Furthermore, the restriction to a single source of semantic information ignores the potential for varied interpretations inherent in each video. To solve these problems, we propose the Exploring Deeper into Semantics (EDS) method for video captioning. EDS comprises three feasible modules that focus on semantic information. Specifically, we propose the Semantic Supervised Generation (SSG) module. It integrates semantic information as a prior, and facilitates enriched interrelations among words for POS supervision. A novel Similarity Semantic Extension (SSE) module is proposed to employ a query-based semantic expansion for collaboratively generating fine-grained content. Additionally, the proposed Input Semantic Enhancement (ISE) module provides a strategy for mitigating the information constraints faced during the initial phase of word generation. The experiments conducted show that, by exploiting semantic information through supervision, extension, and enhancement, EDS not only yields promising results but also underlines the effectiveness. Code will be available at https://github.com/BradenJoson/EDS.
引用
收藏
页码:133 / 140
页数:8
相关论文
共 50 条
  • [31] Improving distinctiveness in video captioning with text-video similarity
    Velda, Vania
    Immanuel, Steve Andreas
    Hendria, Willy Fitra
    Jeong, Cheol
    IMAGE AND VISION COMPUTING, 2023, 136
  • [32] Quality Enhancement Based Video Captioning in Video Communication Systems
    Le, The Van
    Lee, Jin Young
    IEEE ACCESS, 2024, 12 : 40989 - 40999
  • [33] Multiple Videos Captioning Model for Video Storytelling
    Han, Seung-Ho
    Go, Bo-Won
    Choi, Ho-Jin
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 355 - 358
  • [34] Early Embedding and Late Reranking for Video Captioning
    Dong, Jianfeng
    Li, Xirong
    Lan, Weiyu
    Huo, Yujia
    Snoek, Cees G. M.
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 1082 - 1086
  • [35] Efficient Video Captioning on Heterogeneous System Architectures
    Huang, Horng-Ruey
    Hong, Ding-Yong
    Wu, Jan-Jan
    Liu, Pangfeng
    Hsu, Wei-Chung
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 1035 - 1045
  • [36] Incorporating Textual Similarity in Video Captioning Schemes
    Gkountakos, Konstantinos
    Dimou, Anastasios
    Papadopoulos, Georgios Th.
    Daras, Petros
    2019 IEEE INTERNATIONAL CONFERENCE ON ENGINEERING, TECHNOLOGY AND INNOVATION (ICE/ITMC), 2019,
  • [37] Video captioning with global and local text attention
    Peng, Yuqing
    Wang, Chenxi
    Pei, Yixin
    Li, Yingjun
    VISUAL COMPUTER, 2022, 38 (12): : 4267 - 4278
  • [38] Global semantic enhancement network for video captioning
    Luo, Xuemei
    Luo, Xiaotong
    Wang, Di
    Liu, Jinhui
    Wan, Bo
    Zhao, Lin
    PATTERN RECOGNITION, 2024, 145
  • [39] Exploiting the local temporal information for video captioning
    Wei, Ran
    Mi, Li
    Hu, Yaosi
    Chen, Zhenzhong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 67 (67)
  • [40] Image and Video Captioning with Augmented Neural Architectures
    Shetty, Rakshith
    Tavakoli, Hamed R.
    Laaksonen, Jorma
    IEEE MULTIMEDIA, 2018, 25 (02) : 34 - 46