Chinese image captioning with fusion encoder and visual keyword search

被引:0
|
作者
Zou, Yang [1 ]
Liao, Shiyu [1 ]
Wang, Qifei [1 ]
机构
[1] Hohai Univ, Inst Intelligence Sci & Technol, Coll Comp & Informat, Nanjing, Peoples R China
关键词
Chinese image captioning; fusion encoder; image retrieval; sentence-level optimization; visual keyword search;
D O I
10.1049/ipr2.13155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic generation of image captions is essentially a cross-modal conversion from image to text. Owing to the differences in linguistic characteristics between Chinese and English, quite a few Chinese image captioning methods have recently been proposed. Nevertheless, the existing Chinese image captioning models usually lack attention to local details of images or tend to produce general descriptions. To address these challenges, a Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The fusion encoder can simultaneously extract local and global features of the input image to enrich the semantic information in the decoding stage, visual keyword search can pursue potential visual words associated with the image content, and the reinforcement learning mechanism can optimize the evaluation metric CIDEr at sentence level to promote the lexical diversity of image description. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. A Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. image
引用
收藏
页码:3055 / 3069
页数:15
相关论文
共 24 条
  • [1] Stack LSTM for Chinese Image Captioning
    Wu, Wei
    Sun, Deshuai
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1613 - 1617
  • [2] Keyword Visual Representation for Image Retrieval and Image Annotation
    Nhu Van Nguyen
    Boucher, Alain
    Ogier, Jean-Marc
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (06)
  • [3] Attention Based Double Layer LSTM for Chinese Image Captioning
    Wu, Wei
    Sun, Deshuai
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation
    Zhou, Tie Hua
    Wang, Ling
    Ryu, Keun Ho
    SUSTAINABILITY, 2015, 7 (05) : 6303 - 6320
  • [5] A unified framework for image retrieval using keyword and visual features
    Jing, F
    Li, MJ
    Zhang, HJ
    Zhang, B
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2005, 14 (07) : 979 - 989
  • [6] Visual Image Search Improved with Geometric Consistency
    Ozkan, Savas
    Esen, Ersin
    Akar, Gozde Bozdagi
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [7] Manifold-Based Combination of Visual Features and Keyword Features for Image Retrieval
    Li, Jing
    Liu, Fuqiang
    Li, Zhipeng
    Cui, Jianzhu
    PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 554 - 558
  • [8] Image Search Result Re-Ranking Using Keyword Clusters With Duplicate Avoidance
    Kadam, Rahul
    Nighot, M. K.
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH, 2016, : 854 - 856
  • [9] Fast Democratic Aggregation and Query Fusion for Image Search
    Gao, Zhanning
    Xue, Jianru
    Zhou, Wengang
    Pang, Shanmin
    Tian, Qi
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 35 - 42
  • [10] Mobile Visual Search Using Image and Text Features
    Tsai, Sam S.
    Chen, Huizhong
    Chen, David
    Vedantham, Ramakrishna
    Grzeszczuk, Radek
    Girod, Bernd
    2011 CONFERENCE RECORD OF THE FORTY-FIFTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS (ASILOMAR), 2011, : 845 - 849