Continually Tuning a Large Language Model for Multi-domain Radiology Report Generation

被引：1

作者：

Sun, Yihua ^{[1
]}

Khor, Hee Guan ^{[1
]}

Wang, Yuanzheng ^{[2
]}

Wang, Zhuhao ^{[1
]}

Zhao, Hongliang ^{[2
]}

Zhang, Yu ^{[2
]}

Ma, Longfei ^{[1
]}

Zheng, Zhuozhao ^{[2
]}

Liao, Hongen ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Biomed Engn, Beijing, Peoples R China

[2] Tsinghua Univ, Beijing Tsinghua Changgung Hosp, Sch Clin Med, Dept Radiol, Beijing, Peoples R China

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V | 2024年 / 15005卷

基金：

中国国家自然科学基金;

关键词：

Continual learning; Large language model; Multi-domain; Multi-modality; Parameter efficient fine-tuning; Report generation;

D O I：

10.1007/978-3-031-72086-4_17

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) have demonstrated potential across various tasks, including vision-language applications like chest Xray (XR) report generation (RG) in healthcare. Recent RG approaches focus on optimizing model performance for a single dataset with a single XR modality, often neglecting the critical area of computed tomography (CT) report generation. The challenge is compounded by medical datasets being isolated across different centers, making comprehensive collection difficult. Furthermore, LLMs trained on datasets sequentially can experience catastrophic forgetting. In this paper, we move beyond conventional approaches of training on a single dataset, and focus on improving the overall performance on sequentially collected multi-center datasets. We incorporate four datasets with diverse languages and image modalities for the experiments. Our approach utilizes a minimal number of task-specific learnable weights within an LLM-based RG method for each domain, maintaining the majority of weights frozen to avoid forgetting. Utilizing LLMs' multilingual generalizability, we align models and facilitate knowledge sharing through a multi-label supervised contrastive loss within the LLM hidden space. We design a 2D-3D adapter for the image encoder to transfer from XR to CT RG tasks. A CT disease graph is established for transferring knowledge from XR to CT RG tasks, using CT's most relevant XR disease class centers in a triplet loss. Extensive experiments validate our design.

引用

页码：177 / 187

页数：11

共 37 条

[1]

Ba L. J., 2016, arXiv

[2]

Bai JZ, 2023, Arxiv, DOI [arXiv:2309.16609, DOI 10.48550/ARXIV.2309.16609]

[3]

Bai JZ, 2023, Arxiv, DOI [arXiv:2308.12966, 10.48550/arXiv.2308.12966, DOI 10.48550/ARXIV.2308.12966]

[4]

Brown TB, 2020, ADV NEUR IN, V33

[5]

Buzzega P., 2020, P ADV NEUR INF PROC, V33, P15920

[6]

Chin-Yew Lin, 2004, Text Summarization Branches Out, P74

[7] Preparing a collection of radiology examinations for distribution and retrieval [J].

Demner-Fushman, Dina ;

Kohli, Marc D. ;

Rosenman, Marc B. ;

Shooshan, Sonya E. ;

Rodriguez, Laritza ;

Antani, Sameer ;

Thoma, George R. ;

McDonald, Clement J. .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (02) :304-310

[8]

Du ZX, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P320

[9] Curcumol Reduces Aerobic Glycolysis and Overcomes Chemoresistance by Inducing Cdh1-Mediated Skp2 Ubiquitination [J].

Gan, Yu ;

Zhou, Li ;

Wang, Ruike ;

Zhang, Yangnan ;

Li, Xiaoying ;

Han, Shuangze ;

Rong, Pengfei ;

Wang, Wei ;

Li, Wei .

AMERICAN JOURNAL OF CHINESE MEDICINE, 2023, 51 (03) :723-740

[10] PhysioBank, PhysioToolkit, and PhysioNet - Components of a new research resource for complex physiologic signals [J].

Goldberger, AL ;

Amaral, LAN ;

Glass, L ;

Hausdorff, JM ;

Ivanov, PC ;

Mark, RG ;

Mietus, JE ;

Moody, GB ;

Peng, CK ;

Stanley, HE .

CIRCULATION, 2000, 101 (23) :E215-E220

← 1 2 3 4 →