Research on image text generation based on word2vec visual vocabulary attention

被引:0
作者
Li, Danyang [1 ]
Zhao, Yahui [1 ]
Cui, Rongyi [1 ]
Zhao, Linlin [1 ]
机构
[1] Yanbian Univ, Intelligent Informat Proc Lab, Dept Comp Sci & Technol, Yanji, Jilin, Peoples R China
来源
2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021) | 2021年
关键词
word2vec; Image2text; Image captions; Attention;
D O I
10.1109/ACCTCS52002.2021.00075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A method of image text generation based on the combination of word2vec keyword extraction and attention mechanism is proposed. First, the co-occurring words with visual entities in the description set were extracted for each image in the dataset; Then the similarity was calculated for the extracted keywords, the similar words were filtered out to expand the keyword list, and the words in the vocabulary were retained to create new descriptions for the images. Finally, the test set images were combined with attention mechanism to generate description text. The experiments prove that the method proposed in this paper can achieve automatic annotation of images and can effectively solve the attention diffusion problem in the process of image text generation.
引用
收藏
页码:344 / 348
页数:5
相关论文
共 34 条
[31]   Extracting salient object from remote sensing image based on guidance of visual attention - art. no. 67902W [J].
Xu, Gang ;
Huo, Hong ;
Fang, Tao ;
Li, Deren .
REMOTE SENSING AND GIS DATA PROCESSING AND APPLICATIONS; AND INNOVATIVE MULTISPECTRAL TECHNOLOGY AND APPLICATIONS, PTS 1 AND 2, 2007, 6790 :W7902-W7902
[32]   Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining [J].
Li, Ruijun ;
Li, Weihua ;
Yang, Yi ;
Bai, Quan .
PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2022, 13629 :339-350
[33]   DRA U-Net: An Attention based U-Net Framework for 2D Medical Image Segmentation [J].
Zhang, Xian ;
Feng, Ziyuan ;
Zhong, Tianchi ;
Shen, Sicheng ;
Zhang, Ruolin ;
Zhou, Lijie ;
Zhang, Bo ;
Wang, Wendong .
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, :3936-3942
[34]   AN ATTENTION-SEQ2SEQ MODEL BASED ON CRNN ENCODING FOR AUTOMATIC LABANOTATION GENERATION FROM MOTION CAPTURE DATA [J].
Li, Min ;
Miao, Zhenjiang ;
Zhang, Xiao-Ping ;
Xu, Wanru .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :4185-4189