Enhancing Clinical Trial Summarization: Leveraging Large Language Models and Knowledge Graphs for Entity Preservation

被引:0
作者
Nahed, Pouyan [1 ]
Kambar, Mina Esmail Zadeh Nojoo [1 ]
Taghva, Kazem [1 ]
机构
[1] Univ Nevada, Dept Comp Sci, Las Vegas, NV 89154 USA
来源
PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, ICICT 2024, VOL 7 | 2024年 / 1003卷
基金
美国国家科学基金会;
关键词
Large language models; Clinical data; Summarization; Named entity preservation; Knowledge graph;
D O I
10.1007/978-981-97-3302-6_26
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
ClinicalTrials.gov is an accessible online medical resource for researchers, healthcare professionals, and policy designers seeking detailed information on clinical trials. Summarizing these long clinical records can significantly reduce the time needed for the database users as the process transforms comprehensive information into concise synopses, preserving the essential meaning and facilitating understanding. In this paper, we employ the Bidirectional and Auto-Regressive Transformers model to generate the trials' brief summaries. Our contributions provide new preprocessing techniques for model training, which leads to a robust summarization model. The fine-tuned model significantly enhanced ROUGE-1, ROUGE-2, and ROUGEL F1-scores by 14%, 23%, and 20%, respectively, compared to previous studies. Additionally, we present an innovative knowledge graph based on entity classes to assess the generated summaries. This graph not only quantifies the essential entities transformed from the original text to the summaries but also provides insights into their specific order and arrangement in sentences.
引用
收藏
页码:325 / 336
页数:12
相关论文
共 20 条
  • [1] Akdemir A, 2020, CLEF working notes
  • [2] Automatic Text Summarization of Biomedical Text Data: A Systematic Review
    Chaves, Andrea
    Kesiku, Cyrille
    Garcia-Zapirain, Begonya
    [J]. INFORMATION, 2022, 13 (08)
  • [3] Multi-Task Learning for Abstractive and Extractive Summarization
    Chen, Yangbin
    Ma, Yun
    Mao, Xudong
    Li, Qing
    [J]. DATA SCIENCE AND ENGINEERING, 2019, 4 (01) : 14 - 23
  • [4] CUMMINGS J, 2022, TRANSL RES CLIN INTE, V8, DOI DOI 10.1002/TRC2.12295
  • [5] DeYoung J., 2021, arXiv
  • [6] Biomedical-domain pre-trained language model for extractive summarization
    Du, Yongping
    Li, Qingxiao
    Wang, Lulin
    He, Yanqing
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 199 (199)
  • [7] Esmaeilzadeh A, 2021, Lecture notes in networks and systems, V296, P175
  • [8] Extractive summarization of clinical trial descriptions
    Gulden, Christian
    Kirchner, Melanie
    Schuettler, Christina
    Hinderer, Marc
    Kampf, Marvin
    Prokosch, Hans-Ulrich
    Toddenroth, Dennis
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 129 : 114 - 121
  • [9] Overview and Importance of Data Quality for Machine Learning Tasks
    Jain, Abhinav
    Patel, Hima
    Nagalapatti, Lokesh
    Gupta, Nitin
    Mehta, Sameep
    Guttula, Shanmukha
    Mujumdar, Shashank
    Afzal, Shazia
    Mittal, Ruhi Sharma
    Munigala, Vitobha
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3561 - 3562
  • [10] Kwon W, 2023, P ACM SIGOPS 29 S OP