Contextualized Keyword Representations for Multi-modal Retinal Image Captioning

被引：11

作者：

Huang, Jia-Hong ^{[1
]}

Wu, Ting-Wei ^{[2
]}

Worring, Marcel ^{[1
]}

机构：

[1] Univ Amsterdam, Amsterdam, Netherlands

[2] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21) | 2021年

关键词：

OPTIC-NERVE; CLASSIFICATION;

D O I：

10.1145/3460426.3463667

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Medical image captioning automatically generates a medical description to describe the content of a given medical image. Traditional medical image captioning models create a medical description based on a single medical image input only. Hence, an abstract medical description or concept is hard to be generated based on the traditional approach. Such a method limits the effectiveness of medical image captioning. Multi-modal medical image captioning is one of the approaches utilized to address this problem. In multi-modal medical image captioning, textual input, e.g., expert-defined keywords, is considered as one of the main drivers of medical description generation. Thus, encoding the textual input and the medical image effectively are both important for the task of multi-modal medical image captioning. In this work, a new end-to-end deep multi-modal medical image captioning model is proposed. Contextualized keyword representations, textual feature reinforcement, and masked self-attention are used to develop the proposed approach. Based on the evaluation of an existing multi-modal medical image captioning dataset, experimental results show that the proposed model is effective with an increase of +53.2% in BLEU-avg and +18.6% in CIDEr, compared with the state-of-the-art method.

引用

页码：645 / 652

页数：8

共 63 条

[1] Accuracy Assessment of Intra- and Intervisit Fundus Image Registration for Diabetic Retinopathy Screening
Adal, Kedir M.
van Etten, Peter G.
Martinez, Jose P.
van Vliet, Lucas J.
Vermeer, Koenraad A.
[J]. INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2015, 56 (03) : 1805 - 1812
[2] VQA: Visual Question Answering
Agrawal, Aishwarya
Lu, Jiasen
Antol, Stanislaw
Mitchell, Margaret
Zitnick, C. Lawrence
Parikh, Devi
Batra, Dhruv
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
[3] REVIEW - A Reference Data Set for Retinal Vessel Profiles
Al-Diri, Bashir
Hunter, Andrew
Steel, David
Habib, Maged
Hudaib, Taghread
Berry, Simon
[J]. 2008 30TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-8, 2008, : 2262 - +
[4] [Anonymous], 2007, DIARETDB1 STANDARD D
[5] Argyros, 2017, J MODELING OPHTHALMO, V1, P16, DOI [10.35119/maio.v1i4.42, DOI 10.35119/MAIO.V1I4.42]
[6] Banerjee S., 2005, P ACL WORKSH INTR EX, P65
[7] Identification of the optic nerve head with genetic algorithms
Carmona, Enrique J.
Rincon, Mariano
Garcia-Feijoo, Julian
Martinez-de-la-Casa, Jose M.
[J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2008, 43 (03) : 243 - 259
[8] FEEDBACK ON A PUBLICLY DISTRIBUTED IMAGE DATABASE: THE MESSIDOR DATABASE
Decenciere, Etienne
Zhang, Xiwei
Cazuguel, Guy
Lay, Bruno
Cochener, Beatrice
Trone, Caroline
Gain, Philippe
Ordonez-Varela, John-Richard
Massin, Pascale
Erginay, Ali
Charton, Beatrice
Klein, Jean-Claude
[J]. IMAGE ANALYSIS & STEREOLOGY, 2014, 33 (03) : 231 - 234
[9] Devlin J, 2018, ARXIV
[10] Ethayarajh K, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P55

← 1 2 3 4 5 6 7 →