Contextualized Keyword Representations for Multi-modal Retinal Image Captioning

被引:11
作者
Huang, Jia-Hong [1 ]
Wu, Ting-Wei [2 ]
Worring, Marcel [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21) | 2021年
关键词
OPTIC-NERVE; CLASSIFICATION;
D O I
10.1145/3460426.3463667
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical image captioning automatically generates a medical description to describe the content of a given medical image. Traditional medical image captioning models create a medical description based on a single medical image input only. Hence, an abstract medical description or concept is hard to be generated based on the traditional approach. Such a method limits the effectiveness of medical image captioning. Multi-modal medical image captioning is one of the approaches utilized to address this problem. In multi-modal medical image captioning, textual input, e.g., expert-defined keywords, is considered as one of the main drivers of medical description generation. Thus, encoding the textual input and the medical image effectively are both important for the task of multi-modal medical image captioning. In this work, a new end-to-end deep multi-modal medical image captioning model is proposed. Contextualized keyword representations, textual feature reinforcement, and masked self-attention are used to develop the proposed approach. Based on the evaluation of an existing multi-modal medical image captioning dataset, experimental results show that the proposed model is effective with an increase of +53.2% in BLEU-avg and +18.6% in CIDEr, compared with the state-of-the-art method.
引用
收藏
页码:645 / 652
页数:8
相关论文
共 63 条
  • [1] Accuracy Assessment of Intra- and Intervisit Fundus Image Registration for Diabetic Retinopathy Screening
    Adal, Kedir M.
    van Etten, Peter G.
    Martinez, Jose P.
    van Vliet, Lucas J.
    Vermeer, Koenraad A.
    [J]. INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2015, 56 (03) : 1805 - 1812
  • [2] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [3] REVIEW - A Reference Data Set for Retinal Vessel Profiles
    Al-Diri, Bashir
    Hunter, Andrew
    Steel, David
    Habib, Maged
    Hudaib, Taghread
    Berry, Simon
    [J]. 2008 30TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-8, 2008, : 2262 - +
  • [4] [Anonymous], 2007, DIARETDB1 STANDARD D
  • [5] Argyros, 2017, J MODELING OPHTHALMO, V1, P16, DOI [10.35119/maio.v1i4.42, DOI 10.35119/MAIO.V1I4.42]
  • [6] Banerjee S., 2005, P ACL WORKSH INTR EX, P65
  • [7] Identification of the optic nerve head with genetic algorithms
    Carmona, Enrique J.
    Rincon, Mariano
    Garcia-Feijoo, Julian
    Martinez-de-la-Casa, Jose M.
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2008, 43 (03) : 243 - 259
  • [8] FEEDBACK ON A PUBLICLY DISTRIBUTED IMAGE DATABASE: THE MESSIDOR DATABASE
    Decenciere, Etienne
    Zhang, Xiwei
    Cazuguel, Guy
    Lay, Bruno
    Cochener, Beatrice
    Trone, Caroline
    Gain, Philippe
    Ordonez-Varela, John-Richard
    Massin, Pascale
    Erginay, Ali
    Charton, Beatrice
    Klein, Jean-Claude
    [J]. IMAGE ANALYSIS & STEREOLOGY, 2014, 33 (03) : 231 - 234
  • [9] Devlin J, 2018, ARXIV
  • [10] Ethayarajh K, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P55