Video captioning with global and local text attention

被引：1

作者：

Peng, Yuqing ^{[1
]}

Wang, Chenxi ^{[1
]}

Pei, Yixin ^{[1
]}

Li, Yingjun ^{[1
]}

机构：

[1] Hebei Univ Technol, Sch Artificial Intelligence, Tianjin 300401, Peoples R China

来源：

VISUAL COMPUTER | 2022年 / 38卷 / 12期

关键词：

Video captioning; Global control; Local strengthening; Bidirectional; AGGREGATION;

D O I：

10.1007/s00371-021-02294-0

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The task of video captioning is to generate a video description corresponding to the video content, so there are stringent requirements for the extraction of fine-grained video features and the language processing of tag text. A new method using global control of the text and local strengthening during training is proposed in this paper. In this method, the context can be referred to when the training generates text. In addition, more attention is given to important words in the text, such as nouns and predicate verbs, and this approach greatly improves the recognition of objects and provides more accurate prediction of actions in the video. Moreover, in this paper, the authors adopt 2D and 3D multimodal feature extraction for the process of video feature extraction. Better results are achieved by the fine-grained feature capture of global attention and the fusion of bidirectional time flow. The method in this paper obtains good results on both the MSR-VTT and MSVD datasets.

引用

页码：4267 / 4278

页数：12

共 50 条

[21] Motion Guided Spatial Attention for Video Captioning
Chen, Shaoxiang
Jiang, Yu-Gang
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8191 - 8198
[22] UAT: Universal Attention Transformer for Video Captioning
Im, Heeju
Choi, Yong-Suk
SENSORS, 2022, 22 (13)
[23] MULTIMODAL SEMANTIC ATTENTION NETWORK FOR VIDEO CAPTIONING
Sun, Liang
Li, Bing
Yuan, Chunfeng
Zha, Zhengjun
Hu, Weiming
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1300 - 1305
[24] Contextual Attention Network for Emotional Video Captioning
Song, Peipei
Guo, Dan
Cheng, Jun
Wang, Meng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1858 - 1867
[25] Attention based video captioning framework for Hindi
Singh, Alok
Singh, Thoudam Doren
Bandyopadhyay, Sivaji
MULTIMEDIA SYSTEMS, 2022, 28 (01) : 195 - 207
[26] Image Captioning with Text-Based Visual Attention
Chen He
Haifeng Hu
Neural Processing Letters, 2019, 49 : 177 - 185
[27] Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network
Liu, Chunsheng
Zhang, Xiao
Chang, Faliang
Li, Shuang
Hao, Penghui
Lu, Yansha
Wang, Yinhai
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) : 3615 - 3627
[28] Image Captioning with Text-Based Visual Attention
He, Chen
Hu, Haifeng
NEURAL PROCESSING LETTERS, 2019, 49 (01) : 177 - 185
[29] Text-Guided Attention Model for Image Captioning
Mun, Jonghwan
Cho, Minsu
Han, Bohyung
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4233 - 4239
[30] Global-Local Mutual Attention Model for Text Classification
Ma, Qianli
Yu, Liuhong
Tian, Shuai
Chen, Enhuan
Ng, Wing W. Y.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2127 - 2139

← 1 2 3 4 5 →