EDS: Exploring deeper into semantics for video captioning

被引：0

作者：

Lou, Yibo ^{[1
]}

Zhang, Wenjie ^{[1
]}

Song, Xiaoning ^{[1
,2
]}

Hua, Yang ^{[1
]}

Wu, Xiao-Jun ^{[1
]}

机构：

[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Peoples R China

[2] DiTu Suzhou Biotechnol Co Ltd, Suzhou 215000, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 186卷

基金：

中国国家自然科学基金;

关键词：

Video captioning; Text generation; Semantic information; GENERATION;

D O I：

10.1016/j.patrec.2024.09.017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efficiently leveraging semantic information is crucial for advancing video captioning in recent years. But, prevailing approaches that involve designing various Part-of-Speech (POS) tags as prior information lack essential linguistic knowledge guidance throughout the training procedure, particularly in the context of POS and initial description generation. Furthermore, the restriction to a single source of semantic information ignores the potential for varied interpretations inherent in each video. To solve these problems, we propose the Exploring Deeper into Semantics (EDS) method for video captioning. EDS comprises three feasible modules that focus on semantic information. Specifically, we propose the Semantic Supervised Generation (SSG) module. It integrates semantic information as a prior, and facilitates enriched interrelations among words for POS supervision. A novel Similarity Semantic Extension (SSE) module is proposed to employ a query-based semantic expansion for collaboratively generating fine-grained content. Additionally, the proposed Input Semantic Enhancement (ISE) module provides a strategy for mitigating the information constraints faced during the initial phase of word generation. The experiments conducted show that, by exploiting semantic information through supervision, extension, and enhancement, EDS not only yields promising results but also underlines the effectiveness. Code will be available at https://github.com/BradenJoson/EDS.

引用

页码：133 / 140

页数：8

共 50 条

[31] Improving distinctiveness in video captioning with text-video similarity
Velda, Vania
Immanuel, Steve Andreas
Hendria, Willy Fitra
Jeong, Cheol
IMAGE AND VISION COMPUTING, 2023, 136
[32] Quality Enhancement Based Video Captioning in Video Communication Systems
Le, The Van
Lee, Jin Young
IEEE ACCESS, 2024, 12 : 40989 - 40999
[33] Multiple Videos Captioning Model for Video Storytelling
Han, Seung-Ho
Go, Bo-Won
Choi, Ho-Jin
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 355 - 358
[34] Early Embedding and Late Reranking for Video Captioning
Dong, Jianfeng
Li, Xirong
Lan, Weiyu
Huo, Yujia
Snoek, Cees G. M.
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 1082 - 1086
[35] Efficient Video Captioning on Heterogeneous System Architectures
Huang, Horng-Ruey
Hong, Ding-Yong
Wu, Jan-Jan
Liu, Pangfeng
Hsu, Wei-Chung
2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 1035 - 1045
[36] Incorporating Textual Similarity in Video Captioning Schemes
Gkountakos, Konstantinos
Dimou, Anastasios
Papadopoulos, Georgios Th.
Daras, Petros
2019 IEEE INTERNATIONAL CONFERENCE ON ENGINEERING, TECHNOLOGY AND INNOVATION (ICE/ITMC), 2019,
[37] Video captioning with global and local text attention
Peng, Yuqing
Wang, Chenxi
Pei, Yixin
Li, Yingjun
VISUAL COMPUTER, 2022, 38 (12): : 4267 - 4278
[38] Global semantic enhancement network for video captioning
Luo, Xuemei
Luo, Xiaotong
Wang, Di
Liu, Jinhui
Wan, Bo
Zhao, Lin
PATTERN RECOGNITION, 2024, 145
[39] Exploiting the local temporal information for video captioning
Wei, Ran
Mi, Li
Hu, Yaosi
Chen, Zhenzhong
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 67 (67)
[40] Image and Video Captioning with Augmented Neural Architectures
Shetty, Rakshith
Tavakoli, Hamed R.
Laaksonen, Jorma
IEEE MULTIMEDIA, 2018, 25 (02) : 34 - 46

← 1 2 3 4 5 →