BiTransformer: augmenting semantic context in video captioning via bidirectional decoder

被引：5

作者：

Zhong, Maosheng ^{[1
]}

Zhang, Hao ^{[1
]}

Wang, Yong ^{[1
]}

Xiong, Hao ^{[1
]}

机构：

[1] Jiangxi Normal Univ, 99 Ziyang Ave, Nanchang, Jiangxi, Peoples R China

来源：

MACHINE VISION AND APPLICATIONS | 2022年 / 33卷 / 05期

关键词：

Video captioning; Bidirectional decoding; Transformer;

D O I：

10.1007/s00138-022-01329-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video captioning is an important problem involved in many applications. It aims to generate some descriptions of the content of a video. Most of existing methods for video captioning are based on the deep encoder-decoder models, particularly, the attention-based models (say Transformer). However, the existing transformer-based models may not fully exploit the semantic context, that is, only using the left-to-right style of context but ignoring the right-to-left counterpart. In this paper, we introduce a bidirectional (forward-backward) decoder to exploit both the left-to-right and right-to-left styles of context for the Transformer-based video captioning model. Thus, our model is called bidirectional Transformer (dubbed BiTransformer). Specifically, in the bridge of the encoder and forward decoder (aiming to capture the left-to-right context) used in the existing Transformer-based models, we plug in a backward decoder to capture the right-to-left context. Equipped with such bidirectional decoder, the semantic context of videos will be more fully exploited, resulting in better video captions. The effectiveness of our model is demonstrated over two benchmark datasets, i.e., MSVD and MSR-VTT,via comparing to the state-of-the-art methods. Particularly, in terms of the important evaluation metric CIDEr, the proposed model outperforms the state-of-the-art models with improvements of 1.2% in both datasets.

引用

页数：9

共 50 条

[31] Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network
Liu, Chunsheng
Zhang, Xiao
Chang, Faliang
Li, Shuang
Hao, Penghui
Lu, Yansha
Wang, Yinhai
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) : 3615 - 3627
[32] Multimodal Context Fusion Based Dense Video Captioning Algorithm
Li, Meiqi
Zhou, Ziwei
ENGINEERING LETTERS, 2025, 33 (04) : 1061 - 1072
[33] Multi-level video captioning method based on semantic space
Yao, Xiao
Zeng, Yuanlin
Gu, Min
Yuan, Ruxi
Li, Jie
Ge, Junyi
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 72113 - 72130
[34] Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning
Shi, Botian
Ji, Lei
Niu, Zhendong
Duan, Nan
Zhou, Ming
Chen, Xilin
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4337 - 4345
[35] Global-Local Combined Semantic Generation Network for Video Captioning
Mao L.
Gao H.
Yang D.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (09): : 1374 - 1382
[36] Bidirectional Maximum Entropy Training With Word Co-Occurrence for Video Captioning
Liu, Sheng
Li, Annan
Wang, Jiahao
Wang, Yunhong
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4494 - 4507
[37] End-to-End Video Captioning Based on Multiview Semantic Alignment for Human-Machine Fusion
Wu, Shuai
Gao, Yubing
Yang, Weidong
Li, Hongkai
Zhu, Guangyu
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 22 : 4682 - 4690
[38] Stacked Multimodal Attention Network for Context-Aware Video Captioning
Zheng, Yi
Zhang, Yuejie
Feng, Rui
Zhang, Tao
Fan, Weiguo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 31 - 42
[39] MIVCN: Multimodal interaction video captioning network based on semantic association graph
Wang, Ying
Huang, Guoheng
Lin Yuming
Yuan, Haoliang
Pun, Chi-Man
Ling, Wing-Kuen
Cheng, Lianglun
APPLIED INTELLIGENCE, 2022, 52 (05) : 5241 - 5260
[40] Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning
Dong, Shanshan
Niu, Tianzi
Luo, Xin
Liu, Wu
Xu, Xinshun
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)

← 1 2 3 4 5 →