Transformer-based image captioning by leveraging sentence information

被引:0
作者
Chahkandi, Vahid [1 ]
Fadaeieslam, Mohammad Javad [1 ]
Yaghmaee, Farzin [1 ]
机构
[1] Semnan Univ, Fac Elect & Comp Engn, Semnan, Iran
关键词
image captioning; nonautoregressive; attention; transformer; MODELS;
D O I
10.1117/1.JEI.31.4.043005
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Although the autoregressive image captioning methods yield good-quality image descriptions, their sequential structures slow down the speed of sentence generation processes. With a view to overcome these shortcomings, some nonautoregressive models have been proposed, but the quality of sentences produced by them is lower than those obtained in autoregressive methods. We have designed a new structure based on nonautoregressive methods to not only find better relations between sentence words and image salient objects but also combine this information with some positional information, extracted from the sentence, to generate a more qualified target sentence. The experimental results on the standard benchmark show that our proposed model achieves performance better than general nonautoregressive captioning models.
引用
收藏
页数:15
相关论文
共 47 条
  • [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [2] SPICE: Semantic Propositional Image Caption Evaluation
    Anderson, Peter
    Fernando, Basura
    Johnson, Mark
    Gould, Stephen
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
  • [3] [Anonymous], 2011, ADV NEURAL INFORM PR
  • [4] Ba JL., 2016, ARXIV
  • [5] Banerjee S., 2005, P ACL WORKSH INTR EX, P65
  • [6] Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
    Bernardi, Raffaella
    Cakici, Ruket
    Elliott, Desmond
    Erdem, Aykut
    Erdem, Erkut
    Ikizler-Cinbis, Nazli
    Keller, Frank
    Muscat, Adrian
    Plank, Barbara
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 55 : 409 - 442
  • [7] Improvement of image description using bidirectional LSTM
    Chahkandi, Vahid
    Fadaeieslam, Mohammad Javad
    Yaghmaee, Farzin
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2018, 7 (03) : 147 - 155
  • [8] Chen K, 2016, Arxiv, DOI [arXiv:1511.05960, DOI 10.48550/ARXIV.1511.05960,ARXIV]
  • [9] Chen XL, 2015, Arxiv, DOI arXiv:1504.00325
  • [10] Devlin J, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, P100