Chinese image captioning with fusion encoder and visual keyword search

被引：0

作者：

Zou, Yang ^{[1
]}

Liao, Shiyu ^{[1
]}

Wang, Qifei ^{[1
]}

机构：

[1] Hohai Univ, Inst Intelligence Sci & Technol, Coll Comp & Informat, Nanjing, Peoples R China

来源：

IET IMAGE PROCESSING | 2024年 / 18卷 / 11期

关键词：

Chinese image captioning; fusion encoder; image retrieval; sentence-level optimization; visual keyword search;

D O I：

10.1049/ipr2.13155

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic generation of image captions is essentially a cross-modal conversion from image to text. Owing to the differences in linguistic characteristics between Chinese and English, quite a few Chinese image captioning methods have recently been proposed. Nevertheless, the existing Chinese image captioning models usually lack attention to local details of images or tend to produce general descriptions. To address these challenges, a Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The fusion encoder can simultaneously extract local and global features of the input image to enrich the semantic information in the decoding stage, visual keyword search can pursue potential visual words associated with the image content, and the reinforcement learning mechanism can optimize the evaluation metric CIDEr at sentence level to promote the lexical diversity of image description. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. A Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. image

引用

页码：3055 / 3069

页数：15

共 24 条

[11] Improving Image Annotation via Ranking-Oriented Neighbor Search and Learning-Based Keyword Propagation
Cui, Chaoran
Ma, Jun
Lian, Tao
Chen, Zhumin
Wang, Shuaiqiang
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (01) : 82 - 98
[12] A New Approach to Large-Scale Image Recognition for Visual Search Engines
Sezganov, Dmitry
Porat, Moshe
2013 5TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT), 2013, : 151 - 157
[13] A signature-based bag of visual words method for image indexing and search
dos Santos, Joyce Miranda
de Moura, Edleno Silva
da Silva, Altigran Soares
Cavalcanti, Joao Marcos B.
Torres, Ricardo da Silva
Vidal, Marcio Luiz A.
PATTERN RECOGNITION LETTERS, 2015, 65 : 1 - 7
[14] Multiple level visual semantic fusion method for image re-ranking
Shuhan Qi
Fanglin Wang
Xuan Wang
Yue Guan
Jia Wei
Jian Guan
Multimedia Systems, 2017, 23 : 155 - 167
[15] Multiple level visual semantic fusion method for image re-ranking
Qi, Shuhan
Wang, Fanglin
Wang, Xuan
Guan, Yue
Wei, Jia
Guan, Jian
MULTIMEDIA SYSTEMS, 2017, 23 (01) : 155 - 167
[16] Late fusion of deep learning and handcrafted visual features for biomedical image modality classification
Lee, Sheng Long
Zare, Mohammad Reza
Muller, Henning
IET IMAGE PROCESSING, 2019, 13 (02) : 382 - 391
[17] Visual and textual information fusion using Kernel method for content based image retrieval
Unar, Salahuddin
Wang, Xingyuan
Zhang, Chuan
INFORMATION FUSION, 2018, 44 : 176 - 187
[18] SubDiv17: A Dataset for Investigating Subjectivity in the Visual Diversification of Image Search Results
Rohm, Maia
Ionescu, Bogdan
Ginsca, Alexandru Lucian
Santos, Rodrygo L. T.
Mueller, Henning
PROCEEDINGS OF THE 9TH ACM MULTIMEDIA SYSTEMS CONFERENCE (MMSYS'18), 2018, : 444 - 449
[19] VISUAL-WORD-BASED DUPLICATE IMAGE SEARCH WITH PSEUDO-RELEVANCE FEEDBACK
Hsiao, Jen-Hao
Chen, Chu-Song
Chen, Ming-Syan
2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 669 - +
[20] A decisive content based image retrieval approach for feature fusion in visual and textual images
Unar, Salahuddin
Wang, Xingyuan
Wang, Chunpeng
Wang, Yu
KNOWLEDGE-BASED SYSTEMS, 2019, 179 : 8 - 20

← 1 2 3 →