Chinese image captioning with fusion encoder and visual keyword search

被引:0
作者
Zou, Yang [1 ]
Liao, Shiyu [1 ]
Wang, Qifei [1 ]
机构
[1] Hohai Univ, Inst Intelligence Sci & Technol, Coll Comp & Informat, Nanjing, Peoples R China
关键词
Chinese image captioning; fusion encoder; image retrieval; sentence-level optimization; visual keyword search;
D O I
10.1049/ipr2.13155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic generation of image captions is essentially a cross-modal conversion from image to text. Owing to the differences in linguistic characteristics between Chinese and English, quite a few Chinese image captioning methods have recently been proposed. Nevertheless, the existing Chinese image captioning models usually lack attention to local details of images or tend to produce general descriptions. To address these challenges, a Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The fusion encoder can simultaneously extract local and global features of the input image to enrich the semantic information in the decoding stage, visual keyword search can pursue potential visual words associated with the image content, and the reinforcement learning mechanism can optimize the evaluation metric CIDEr at sentence level to promote the lexical diversity of image description. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. A Chinese image captioning method is proposed that incorporates fusion encoder, visual keyword search, and reinforcement learning. The results of extensive experiments demonstrate that the proposed model outperforms the state-of-the-art models and delivers expressive and informative Chinese image captions. image
引用
收藏
页码:3055 / 3069
页数:15
相关论文
共 24 条
  • [11] Improving Image Annotation via Ranking-Oriented Neighbor Search and Learning-Based Keyword Propagation
    Cui, Chaoran
    Ma, Jun
    Lian, Tao
    Chen, Zhumin
    Wang, Shuaiqiang
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (01) : 82 - 98
  • [12] A New Approach to Large-Scale Image Recognition for Visual Search Engines
    Sezganov, Dmitry
    Porat, Moshe
    2013 5TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT), 2013, : 151 - 157
  • [13] A signature-based bag of visual words method for image indexing and search
    dos Santos, Joyce Miranda
    de Moura, Edleno Silva
    da Silva, Altigran Soares
    Cavalcanti, Joao Marcos B.
    Torres, Ricardo da Silva
    Vidal, Marcio Luiz A.
    PATTERN RECOGNITION LETTERS, 2015, 65 : 1 - 7
  • [14] Multiple level visual semantic fusion method for image re-ranking
    Shuhan Qi
    Fanglin Wang
    Xuan Wang
    Yue Guan
    Jia Wei
    Jian Guan
    Multimedia Systems, 2017, 23 : 155 - 167
  • [15] Multiple level visual semantic fusion method for image re-ranking
    Qi, Shuhan
    Wang, Fanglin
    Wang, Xuan
    Guan, Yue
    Wei, Jia
    Guan, Jian
    MULTIMEDIA SYSTEMS, 2017, 23 (01) : 155 - 167
  • [16] Late fusion of deep learning and handcrafted visual features for biomedical image modality classification
    Lee, Sheng Long
    Zare, Mohammad Reza
    Muller, Henning
    IET IMAGE PROCESSING, 2019, 13 (02) : 382 - 391
  • [17] Visual and textual information fusion using Kernel method for content based image retrieval
    Unar, Salahuddin
    Wang, Xingyuan
    Zhang, Chuan
    INFORMATION FUSION, 2018, 44 : 176 - 187
  • [18] SubDiv17: A Dataset for Investigating Subjectivity in the Visual Diversification of Image Search Results
    Rohm, Maia
    Ionescu, Bogdan
    Ginsca, Alexandru Lucian
    Santos, Rodrygo L. T.
    Mueller, Henning
    PROCEEDINGS OF THE 9TH ACM MULTIMEDIA SYSTEMS CONFERENCE (MMSYS'18), 2018, : 444 - 449
  • [19] VISUAL-WORD-BASED DUPLICATE IMAGE SEARCH WITH PSEUDO-RELEVANCE FEEDBACK
    Hsiao, Jen-Hao
    Chen, Chu-Song
    Chen, Ming-Syan
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 669 - +
  • [20] A decisive content based image retrieval approach for feature fusion in visual and textual images
    Unar, Salahuddin
    Wang, Xingyuan
    Wang, Chunpeng
    Wang, Yu
    KNOWLEDGE-BASED SYSTEMS, 2019, 179 : 8 - 20