Transformer-based image captioning by leveraging sentence information

被引：0

作者：

Chahkandi, Vahid ^{[1
]}

Fadaeieslam, Mohammad Javad ^{[1
]}

Yaghmaee, Farzin ^{[1
]}

机构：

[1] Semnan Univ, Fac Elect & Comp Engn, Semnan, Iran

来源：

JOURNAL OF ELECTRONIC IMAGING | 2022年 / 31卷 / 04期

关键词：

image captioning; nonautoregressive; attention; transformer; MODELS;

D O I：

10.1117/1.JEI.31.4.043005

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Although the autoregressive image captioning methods yield good-quality image descriptions, their sequential structures slow down the speed of sentence generation processes. With a view to overcome these shortcomings, some nonautoregressive models have been proposed, but the quality of sentences produced by them is lower than those obtained in autoregressive methods. We have designed a new structure based on nonautoregressive methods to not only find better relations between sentence words and image salient objects but also combine this information with some positional information, extracted from the sentence, to generate a more qualified target sentence. The experimental results on the standard benchmark show that our proposed model achieves performance better than general nonautoregressive captioning models.

引用

页数：15

共 47 条

[1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, Peter
He, Xiaodong
Buehler, Chris
Teney, Damien
Johnson, Mark
Gould, Stephen
Zhang, Lei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
[2] SPICE: Semantic Propositional Image Caption Evaluation
Anderson, Peter
Fernando, Basura
Johnson, Mark
Gould, Stephen
[J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
[3] [Anonymous], 2011, ADV NEURAL INFORM PR
[4] Ba JL., 2016, ARXIV
[5] Banerjee S., 2005, P ACL WORKSH INTR EX, P65
[6] Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
Bernardi, Raffaella
Cakici, Ruket
Elliott, Desmond
Erdem, Aykut
Erdem, Erkut
Ikizler-Cinbis, Nazli
Keller, Frank
Muscat, Adrian
Plank, Barbara
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 55 : 409 - 442
[7] Improvement of image description using bidirectional LSTM
Chahkandi, Vahid
Fadaeieslam, Mohammad Javad
Yaghmaee, Farzin
[J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2018, 7 (03) : 147 - 155
[8] Chen K, 2016, Arxiv, DOI [arXiv:1511.05960, DOI 10.48550/ARXIV.1511.05960,ARXIV]
[9] Chen XL, 2015, Arxiv, DOI arXiv:1504.00325
[10] Devlin J, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, P100

← 1 2 3 4 5 →