Research on image text generation based on word2vec visual vocabulary attention

被引：0

作者：

Li, Danyang ^{[1
]}

Zhao, Yahui ^{[1
]}

Cui, Rongyi ^{[1
]}

Zhao, Linlin ^{[1
]}

机构：

[1] Yanbian Univ, Intelligent Informat Proc Lab, Dept Comp Sci & Technol, Yanji, Jilin, Peoples R China

来源：

2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021) | 2021年

关键词：

word2vec; Image2text; Image captions; Attention;

D O I：

10.1109/ACCTCS52002.2021.00075

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A method of image text generation based on the combination of word2vec keyword extraction and attention mechanism is proposed. First, the co-occurring words with visual entities in the description set were extracted for each image in the dataset; Then the similarity was calculated for the extracted keywords, the similar words were filtered out to expand the keyword list, and the words in the vocabulary were retained to create new descriptions for the images. Finally, the test set images were combined with attention mechanism to generate description text. The experiments prove that the method proposed in this paper can achieve automatic annotation of images and can effectively solve the attention diffusion problem in the process of image text generation.

引用

页码：344 / 348

页数：5

共 32 条

[21] Cross Corpus Speech Emotion Recognition using transfer learning and attention-based fusion of Wav2Vec2 and prosody features [J].

Naderi, Navid ;

Nasersharif, Babak .

KNOWLEDGE-BASED SYSTEMS, 2023, 277

[22] Multi-modality helps in crisis management: An attention-based deep learning approach of leveraging text for image classification [J].

Ahmad, Zishan ;

Jindal, Raghav ;

Mukuntha, N. S. ;

Ekbal, Asif ;

Bhattachharyya, Pushpak .

EXPERT SYSTEMS WITH APPLICATIONS, 2022, 195

[23] Text-Enhanced Attribute-Based Attention for Generalized Zero-Shot Fine-Grained Image Classification [J].

Chen, Yan-He ;

Yeh, Mei-Chen .

PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, :447-450

[24] A Hierarchical Attention Based Seq2Seq Model for Chinese Lyrics Generation [J].

Fan, Haoshen ;

Wang, Jie ;

Zhuang, Bojin ;

Wang, Shaojun ;

Xiao, Jing .

PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 :279-288

[25] Exploring the potential of Wav2vec 2.0 for speech emotion recognition using classifier combination and attention-based feature fusion [J].

Nasersharif, Babak ;

Namvarpour, Mohammad .

JOURNAL OF SUPERCOMPUTING, 2024, 80 (16) :23667-23688

[26] History-based attention in Seq2Seq model for multi-label text classification [J].

Xiao, Yaoqiang ;

Li, Yi ;

Yuan, Jin ;

Guo, Songrui ;

Xiao, Yi ;

Li, Zhiyong .

KNOWLEDGE-BASED SYSTEMS, 2021, 224

[27] GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi [J].

Mishra, Santosh Kumar ;

Chakraborty, Soham ;

Saha, Sriparna ;

Bhattacharyya, Pushpak .

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)

[28] The role of individual factors in L2 vocabulary learning with cognitive-linguistics-based static and dynamic visual aids [J].

Sato, Takeshi ;

Lai, Yuda ;

Burden, Tyler .

RECALL, 2022, 34 (02) :201-217

[29] Extracting salient object from remote sensing image based on guidance of visual attention - art. no. 67902W [J].

Xu, Gang ;

Huo, Hong ;

Fang, Tao ;

Li, Deren .

REMOTE SENSING AND GIS DATA PROCESSING AND APPLICATIONS; AND INNOVATIVE MULTISPECTRAL TECHNOLOGY AND APPLICATIONS, PTS 1 AND 2, 2007, 6790 :W7902-W7902

[30] Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining [J].

Li, Ruijun ;

Li, Weihua ;

Yang, Yi ;

Bai, Quan .

PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2022, 13629 :339-350

← 1 2 3 4 →