Medical Report Generation from Medical Images Using Vision Transformer and Bart Deep Learning Architectures

被引：0

作者：

Ucan, Murat ^{[1
]}

Kaya, Buket ^{[2
]}

Kaya, Mehmet ^{[3
]}

Alhajj, Reda ^{[4
]}

机构：

[1] Dicle Univ, Dept Comp Technol, Diyarbakir, Turkiye

[2] Firat Univ, Dept Elect & Automat, Elazig, Turkiye

[3] Firat Univ, Dept Comp Engn, Elazig, Turkiye

[4] Univ Calgary, Dept Comp Sci, Calgary, AB, Canada

来源：

SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2024, PT IV | 2025年 / 15214卷

关键词：

Deep Learning; Vision Transformer; ViT; Bidirectional Autoregressive Transformer; BART; Medical Report Generation; Chest X-rays;

D O I：

10.1007/978-3-031-78554-2_17

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generating medical reports from medical images using traditional methods is a time-consuming process that is prone to human error and requires experience. Failure to generate fast reports from medical images delays the treatment of patients, and misdiagnosis can lead to adverse conditions that can cause the death of patients. The main objective of this study is to develop a high-performance deep learning model that can autonomously generate medical reports from medical images. The proposed model consists of a Vision Transformer (ViT) encoder and a Bidirectional Autoregressive Transformer (BART) decoder. Training and testing on the model was conducted using images and reports from the Indiana University Chest X-Ray dataset. The developed model is analyzed with measurable parameters and then compared with its competitors in the literature using the same dataset. The proposed Vi-Ba architecture achieved success scores of 0.150, 0.154, 0.274 in bleu-4, meteor and rouge word matching evaluation metrics, respectively. The Vi-Ba model achieved high reporting performance compared to the studies reviewed in the literature. The results show that the proposed architecture can be used by specialized doctors in hospitals to diagnose diseases faster and more accurately. In this way, misdiagnosis and treatments will be reduced and human life will be protected.

引用

页码：257 / 267

页数：11

共 20 条

[1] Alfarghaly Omar, 2021, Informatics in Medicine Unlocked, V24, DOI 10.1016/j.imu.2021.100557
[2] Banerjee S, 2005, P ACL WORKSH INTR EX, P65, DOI DOI 10.3115/1626355.1626389
[3] Automatic detection of cancer metastasis in lymph node using deep learning
Butun, Ertan
Ucan, Murat
Kaya, Mehmet
[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 82
[4] A hybrid DenseNet121-UNet model for brain tumor segmentation from MR Images
Cinar, Necip
Ozcan, Alper
Kaya, Mehmet
[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 76
[5] Preparing a collection of radiology examinations for distribution and retrieval
Demner-Fushman, Dina
Kohli, Marc D.
Rosenman, Marc B.
Shooshan, Sonya E.
Rodriguez, Laritza
Antani, Sameer
Thoma, George R.
McDonald, Clement J.
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (02) : 304 - 310
[6] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[7] A Survey on Vision Transformer
Han, Kai
Wang, Yunhe
Chen, Hanting
Chen, Xinghao
Guo, Jianyuan
Liu, Zhenhua
Tang, Yehui
Xiao, An
Xu, Chunjing
Xu, Yixing
Yang, Zhaohui
Zhang, Yiman
Tao, Dacheng
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
[8] Harzig P, 2019, Arxiv, DOI arXiv:1908.02123
[9] Toward deep MRI segmentation for Alzheimer's disease detection
Helaly, Hadeer A.
Badawy, Mahmoud
Haikal, Amira Y.
[J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (02) : 1047 - 1063
[10] Lewis M, 2019, Arxiv, DOI arXiv:1910.13461

← 1 2 →