Vision Transformer and Language Model Based Radiology Report Generation

被引:16
作者
Mohsan, Mashood Mohammad [1 ]
Akram, Muhammad Usman [1 ]
Rasool, Ghulam [2 ]
Alghamdi, Norah Saleh [3 ]
Baqai, Muhammad Abdullah Aamer [4 ]
Abbas, Muhammad [1 ]
机构
[1] Natl Univ Sci & Technol, Dept Comp & Software Engn, Islamabad 44000, Pakistan
[2] H Lee Moffitt Canc Ctr & Res Inst, Machine Learning Dept, Tampa, FL 33612 USA
[3] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11671, Saudi Arabia
[4] Michigan State Univ, Coll Engn, E Lansing, MI 48824 USA
关键词
Vision transformers; language models; radiology report; decoder;
D O I
10.1109/ACCESS.2022.3232719
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advancements in transformers exploited computer vision problems which results in state-of-the-art models. Transformer-based models in various sequence prediction tasks such as language translation, sentiment classification, and caption generation have shown remarkable performance. Auto report generation scenarios in medical imaging through caption generation models is one of the applied scenarios for language models and have strong social impact. In these models, convolution neural networks have been used as encoder to gain spatial information and recurrent neural networks are used as decoder to generate caption or medical report. However, using transformer architecture as encoder and decoder in caption or report writing task is still unexplored. In this research, we explored the effect of losing spatial biasness information in encoder by using pre-trained vanilla image transformer architecture and combine it with different pre-trained language transformers as decoder. In order to evaluate the proposed methodology, the Indiana University Chest X-Rays dataset is used where ablation study is also conducted with respect to different evaluations. The comparative analysis shows that the proposed methodology has represented remarkable performance when compared with existing techniques in terms of different performance parameters.
引用
收藏
页码:1814 / 1824
页数:11
相关论文
共 36 条
  • [1] Alfarghaly Omar, 2021, Informatics in Medicine Unlocked, V24, DOI 10.1016/j.imu.2021.100557
  • [2] Allaouzi I., 2018, P 3 INT C SMART CITY, P1
  • [3] [Anonymous], 2020, ARXIV
  • [4] Emerging Properties in Self-Supervised Vision Transformers
    Caron, Mathilde
    Touvron, Hugo
    Misra, Ishan
    Jegou, Herve
    Mairal, Julien
    Bojanowski, Piotr
    Joulin, Armand
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
  • [5] Chen ZH, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P1439
  • [6] Delrue L., 2011, COMP INTERPRETATION, P27, DOI [DOI 10.1007/978-3-540-79942-9_2, DOI 10.1007/978-3-540-79942-92]
  • [7] Preparing a collection of radiology examinations for distribution and retrieval
    Demner-Fushman, Dina
    Kohli, Marc D.
    Rosenman, Marc B.
    Shooshan, Sonya E.
    Rodriguez, Laritza
    Antani, Sameer
    Thoma, George R.
    McDonald, Clement J.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (02) : 304 - 310
  • [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [9] Chest diseases diagnosis using artificial neural networks
    Er, Orhan
    Yumusak, Nejat
    Temurtas, Feyzullah
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (12) : 7648 - 7655
  • [10] Mortality from Aspiration Pneumonia: Incidence, Trends, and Risk Factors
    Gupte, Trisha
    Knack, Arthur
    Cramer, John D.
    [J]. DYSPHAGIA, 2022, 37 (06) : 1493 - 1500