EDS: Exploring deeper into semantics for video captioning

被引：0

作者：

Lou, Yibo ^{[1
]}

Zhang, Wenjie ^{[1
]}

Song, Xiaoning ^{[1
,2
]}

Hua, Yang ^{[1
]}

Wu, Xiao-Jun ^{[1
]}

机构：

[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Peoples R China

[2] DiTu Suzhou Biotechnol Co Ltd, Suzhou 215000, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 186卷

基金：

中国国家自然科学基金;

关键词：

Video captioning; Text generation; Semantic information; GENERATION;

D O I：

10.1016/j.patrec.2024.09.017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efficiently leveraging semantic information is crucial for advancing video captioning in recent years. But, prevailing approaches that involve designing various Part-of-Speech (POS) tags as prior information lack essential linguistic knowledge guidance throughout the training procedure, particularly in the context of POS and initial description generation. Furthermore, the restriction to a single source of semantic information ignores the potential for varied interpretations inherent in each video. To solve these problems, we propose the Exploring Deeper into Semantics (EDS) method for video captioning. EDS comprises three feasible modules that focus on semantic information. Specifically, we propose the Semantic Supervised Generation (SSG) module. It integrates semantic information as a prior, and facilitates enriched interrelations among words for POS supervision. A novel Similarity Semantic Extension (SSE) module is proposed to employ a query-based semantic expansion for collaboratively generating fine-grained content. Additionally, the proposed Input Semantic Enhancement (ISE) module provides a strategy for mitigating the information constraints faced during the initial phase of word generation. The experiments conducted show that, by exploiting semantic information through supervision, extension, and enhancement, EDS not only yields promising results but also underlines the effectiveness. Code will be available at https://github.com/BradenJoson/EDS.

引用

页码：133 / 140

页数：8

共 50 条

[1] Delving Deeper into the Decoder for Video Captioning
Chen, Haoran
Li, Jianmin
Hu, Xiaolin
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1079 - 1086
[2] Brain-inspired learning to deeper inductive reasoning for video captioning
Xiao Yao
Feiyang Xu
Min Gu
Peipei Wang
International Journal of Machine Learning and Cybernetics, 2023, 14 : 3979 - 3991
[3] Brain-inspired learning to deeper inductive reasoning for video captioning
Yao, Xiao
Xu, Feiyang
Gu, Min
Wang, Peipei
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3979 - 3991
[4] Domain-Specific Semantics Guided Approach to Video Captioning
Hemalatha, M.
Sekhar, C. Chandra
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1576 - 1585
[5] Rich Visual and Language Representation with Complementary Semantics for Video Captioning
Tang, Pengjie
Wang, Hanli
Li, Qinyu
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (02)
[6] Exploring Group Video Captioning with Efficient Relational Approximation
Lin, Wang
Jin, Tao
Wang, Ye
Pan, Wenwen
Li, Linjun
Cheng, Xize
Zhao, Zhou
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15235 - 15244
[7] Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag
Xiao, Xinyu
Wang, Lingfeng
Fan, Bin
Xiang, Shiming
Pan, Chunhong
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2068 - 2077
[8] A Semantics-Assisted Video Captioning Model Trained With Scheduled Sampling
Chen, Haoran
Lin, Ke
Maye, Alexander
Li, Jianmin
Hu, Xiaolin
FRONTIERS IN ROBOTICS AND AI, 2020, 7
[9] Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization
Aakur, Sathyanarayanan
de Souza, Fillipe D. M.
Sarkar, Sudeep
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 190 - 199
[10] Exploring the Spatio-Temporal Aware Graph for video captioning
Xue, Ping
Zhou, Bing
IET COMPUTER VISION, 2022, 16 (05) : 456 - 467

← 1 2 3 4 5 →