Chinese image captioning with fusion encoder and visual keyword search

被引：0

作者：

Zou, Yang ^{[1
]}

Liao, Shiyu ^{[1
]}

Wang, Qifei ^{[1
]}

机构：

[1] Hohai Univ, Inst Intelligence Sci & Technol, Coll Comp & Informat, Nanjing, Peoples R China

来源：

IET IMAGE PROCESSING | 2024年 / 18卷 / 11期

关键词：

Chinese image captioning; fusion encoder; image retrieval; sentence-level optimization; visual keyword search;

D O I：

10.1049/ipr2.13155

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic generation of image captions is essentially a cross-modal conversion from image to text. Owing to the differences in linguistic characteristics between Chinese and English, quite a few Chinese image captioning methods have recently been proposed. Nevertheless, the existing Chinese image captioning models usually lack attention to local details of images or tend to produce general descriptions. To address these challenges, a Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The fusion encoder can simultaneously extract local and global features of the input image to enrich the semantic information in the decoding stage, visual keyword search can pursue potential visual words associated with the image content, and the reinforcement learning mechanism can optimize the evaluation metric CIDEr at sentence level to promote the lexical diversity of image description. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. A Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. image

引用

页码：3055 / 3069

页数：15

共 24 条

[21] Improved search space shrinking for medical image retrieval using capsule architecture and decision fusion
Bhattacharya, Jhilik
Bhatia, Tarunpreet
Pannu, Husanbir Singh
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 171
[22] DEMONSTRATING THE NEW COMPACT DESCRIPTORS FOR VISUAL SEARCH (CDVS) STANDARD FOR IMAGE RETRIEVAL ON MOBILE DEVICES
Ballocca, Giovanni
Fiandrotti, Attilio
Gavelli, Marco
Mattelliano, Massimo
Morello, Michele
Mosca, Alessandra
Vergori, Paolo
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 3411 - 3413
[23] A framework for rapid visual image search using single-trial brain evoked responses
Huang, Yonghong
Erdogmus, Deniz
Pavel, Misha
Mathan, Santosh
Hild, Kenneth E., II
NEUROCOMPUTING, 2011, 74 (12-13) : 2041 - 2051
[24] Region-Level Visual Consistency Verification for Large-Scale Partial-Duplicate Image Search
Zhou, Zhili
Wu, Q. M. Jonathan
Yang, Yimin
Sun, Xingming
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (02)

← 1 2 3 →