Chinese image captioning with fusion encoder and visual keyword search

被引：0

作者：

Zou, Yang ^{[1
]}

Liao, Shiyu ^{[1
]}

Wang, Qifei ^{[1
]}

机构：

[1] Hohai Univ, Inst Intelligence Sci & Technol, Coll Comp & Informat, Nanjing, Peoples R China

来源：

IET IMAGE PROCESSING | 2024年 / 18卷 / 11期

关键词：

Chinese image captioning; fusion encoder; image retrieval; sentence-level optimization; visual keyword search;

D O I：

10.1049/ipr2.13155

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic generation of image captions is essentially a cross-modal conversion from image to text. Owing to the differences in linguistic characteristics between Chinese and English, quite a few Chinese image captioning methods have recently been proposed. Nevertheless, the existing Chinese image captioning models usually lack attention to local details of images or tend to produce general descriptions. To address these challenges, a Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The fusion encoder can simultaneously extract local and global features of the input image to enrich the semantic information in the decoding stage, visual keyword search can pursue potential visual words associated with the image content, and the reinforcement learning mechanism can optimize the evaluation metric CIDEr at sentence level to promote the lexical diversity of image description. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. A Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. image

引用

页码：3055 / 3069

页数：15

共 24 条

[1] Stack LSTM for Chinese Image Captioning
Wu, Wei
Sun, Deshuai
PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1613 - 1617
[2] Keyword Visual Representation for Image Retrieval and Image Annotation
Nhu Van Nguyen
Boucher, Alain
Ogier, Jean-Marc
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (06)
[3] Attention Based Double Layer LSTM for Chinese Image Captioning
Wu, Wei
Sun, Deshuai
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[4] Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation
Zhou, Tie Hua
Wang, Ling
Ryu, Keun Ho
SUSTAINABILITY, 2015, 7 (05) : 6303 - 6320
[5] A unified framework for image retrieval using keyword and visual features
Jing, F
Li, MJ
Zhang, HJ
Zhang, B
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2005, 14 (07) : 979 - 989
[6] Visual Image Search Improved with Geometric Consistency
Ozkan, Savas
Esen, Ersin
Akar, Gozde Bozdagi
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
[7] Manifold-Based Combination of Visual Features and Keyword Features for Image Retrieval
Li, Jing
Liu, Fuqiang
Li, Zhipeng
Cui, Jianzhu
PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 554 - 558
[8] Image Search Result Re-Ranking Using Keyword Clusters With Duplicate Avoidance
Kadam, Rahul
Nighot, M. K.
2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH, 2016, : 854 - 856
[9] Fast Democratic Aggregation and Query Fusion for Image Search
Gao, Zhanning
Xue, Jianru
Zhou, Wengang
Pang, Shanmin
Tian, Qi
ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 35 - 42
[10] Mobile Visual Search Using Image and Text Features
Tsai, Sam S.
Chen, Huizhong
Chen, David
Vedantham, Ramakrishna
Grzeszczuk, Radek
Girod, Bernd
2011 CONFERENCE RECORD OF THE FORTY-FIFTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS (ASILOMAR), 2011, : 845 - 849

← 1 2 3 →