Chinese image captioning with fusion encoder and visual keyword search

被引:0
作者
Zou, Yang [1 ]
Liao, Shiyu [1 ]
Wang, Qifei [1 ]
机构
[1] Hohai Univ, Inst Intelligence Sci & Technol, Coll Comp & Informat, Nanjing, Peoples R China
关键词
Chinese image captioning; fusion encoder; image retrieval; sentence-level optimization; visual keyword search;
D O I
10.1049/ipr2.13155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic generation of image captions is essentially a cross-modal conversion from image to text. Owing to the differences in linguistic characteristics between Chinese and English, quite a few Chinese image captioning methods have recently been proposed. Nevertheless, the existing Chinese image captioning models usually lack attention to local details of images or tend to produce general descriptions. To address these challenges, a Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The fusion encoder can simultaneously extract local and global features of the input image to enrich the semantic information in the decoding stage, visual keyword search can pursue potential visual words associated with the image content, and the reinforcement learning mechanism can optimize the evaluation metric CIDEr at sentence level to promote the lexical diversity of image description. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. A Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. image
引用
收藏
页码:3055 / 3069
页数:15
相关论文
共 24 条
  • [21] Improved search space shrinking for medical image retrieval using capsule architecture and decision fusion
    Bhattacharya, Jhilik
    Bhatia, Tarunpreet
    Pannu, Husanbir Singh
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 171
  • [22] DEMONSTRATING THE NEW COMPACT DESCRIPTORS FOR VISUAL SEARCH (CDVS) STANDARD FOR IMAGE RETRIEVAL ON MOBILE DEVICES
    Ballocca, Giovanni
    Fiandrotti, Attilio
    Gavelli, Marco
    Mattelliano, Massimo
    Morello, Michele
    Mosca, Alessandra
    Vergori, Paolo
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 3411 - 3413
  • [23] A framework for rapid visual image search using single-trial brain evoked responses
    Huang, Yonghong
    Erdogmus, Deniz
    Pavel, Misha
    Mathan, Santosh
    Hild, Kenneth E., II
    NEUROCOMPUTING, 2011, 74 (12-13) : 2041 - 2051
  • [24] Region-Level Visual Consistency Verification for Large-Scale Partial-Duplicate Image Search
    Zhou, Zhili
    Wu, Q. M. Jonathan
    Yang, Yimin
    Sun, Xingming
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (02)