BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature

被引:0
作者
Schaefer, Henning [1 ,2 ]
Idrissi-Yaghir, Ahmad [2 ,3 ]
Arzideh, Kamyar [4 ]
Damm, Hendrik [2 ,3 ]
Pakull, Tabea M. G. [1 ,2 ]
Schmidt, Cynthia S. [1 ,4 ]
Bahn, Mikel [4 ]
Lodde, Georg [6 ]
Livingstone, Elisabeth [6 ]
Schadendorf, Dirk [6 ]
Nensa, Felix [4 ,5 ]
Horn, Peter A. [1 ]
Friedrich, Christoph M. [2 ,3 ]
机构
[1] Univ Hosp Essen, Inst Transfus Med, Hufelandstr 55, D-45147 Essen, Germany
[2] Univ Appl Sci & Arts Dortmund FHDO, Dept Comp Sci, Emil Figge Str 42, D-44227 Dortmund, Germany
[3] Univ Hosp Essen, Inst Med Informat Biometry & Epidemiol IMIBE, Hufelandstr 55, D-45147 Essen, Germany
[4] Univ Hosp Essen, Inst Med IKIM, Girardetstr 2, D-45131 Essen, Germany
[5] Univ Hosp Essen, Inst Intervent & Diagnost Radiol & Neuroradiol, Hufelandstr 55, D-45147 Essen, Germany
[6] Univ Hosp Essen, Dept Dermatol, Hufelandstr 55, D-45147 Essen, Germany
来源
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL | 2024年 / 24卷
关键词
Knowledge graph; Named entity recognition; Entity linking; Clinical guidelines; Software; B-CELL LYMPHOMA; RITUXIMAB THERAPY; GASTROESOPHAGEAL JUNCTION; SCIENTIFIC LITERATURE; COMBINED NIVOLUMAB; CHEMOTHERAPY; HALLMARKS; CANCER; SYSTEM; TRIAL;
D O I
10.1016/j.csbj.2024.10.017
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the need for automated methods. This work introduces BioKGrapher, a tool for automatic KG construction using large-scale publication data, with a focus on biomedical concepts related to specific medical conditions. BioKGrapher allows researchers to construct KGs from PubMed IDs. Methods The BioKGrapher pipeline begins with Named Entity Recognition and Linking (NER+NEL) to extract and normalize biomedical concepts from PubMed, mapping them to the Unified Medical Language System (UMLS). Extracted concepts are weighted and re-ranked using Kullback-Leibler divergence and local frequency balancing. These concepts are then integrated into hierarchical KGs, with relationships formed using terminologies like SNOMED CT and NCIt. Downstream applications include multi-label document classification using Adapter- infused Transformer models. Results BioKGrapher effectively aligns generated concepts with clinical practice guidelines from the German Guideline Program in Oncology (GGPO), achieving F1-Scores of up to 0.6. In multi-label classification, Adapter- infused models using a BioKGrapher cancer-specific KG improved micro F1-Scores by up to 0.89 percentage points over a non-specific KG and 2.16 points over base models across three BERT variants. The drug-disease extraction case study identified indications for Nivolumab and Rituximab. Conclusion BioKGrapher is a tool for automatic KG construction, aligning with the GGPO and enhancing downstream task performance. It offers a scalable solution for managing biomedical knowledge, with potential applications in literature recommendation, decision support, and drug repurposing.
引用
收藏
页码:639 / 660
页数:22
相关论文
共 114 条
  • [1] Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities
    Abu-Salih, Bilal
    AL-Qurishi, Muhammad
    Alweshah, Mohammed
    AL-Smadi, Mohammad
    Alfayez, Reem
    Saadeh, Heba
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [2] Named Entity Extraction for Knowledge Graphs: A Literature Overview
    Al-Moslmi, Tareq
    Ocana, Marc Gallofre
    Opdahl, Andreas L.
    Veres, Csaba
    [J]. IEEE ACCESS, 2020, 8 : 32862 - 32881
  • [3] Alsentzer E, 2019, P 2 CLIN NAT LANG PR, P72, DOI DOI 10.18653/V1/W19-1909
  • [4] [Anonymous], 2017, BioNLP2017.
  • [5] PD-1 Blockade with Nivolumab in Relapsed or Refractory Hodgkin's Lymphoma
    Ansell, Stephen M.
    Lesokhin, Alexander M.
    Borrello, Ivan
    Halwani, Ahmad
    Scott, Emma C.
    Gutierrez, Martin
    Schuster, Stephen J.
    Millenson, Michael M.
    Cattry, Deepika
    Freeman, Gordon J.
    Rodig, Scott J.
    Chapuy, Bjoern
    Ligon, Azra H.
    Zhu, Lili
    Grosso, Joseph F.
    Kim, Su Young
    Timmerman, John M.
    Shipp, Margaret A.
    Armand, Philippe
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2015, 372 (04) : 311 - 319
  • [6] An overview of MetaMap: historical perspective and recent advances
    Aronson, Alan R.
    Lang, Francois-Michel
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) : 229 - 236
  • [7] Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer
    Baker, Simon
    Ali, Imran
    Silins, Ilona
    Pyysalo, Sampo
    Guo, Yufan
    Hogberg, Johan
    Stenius, Ulla
    Korhonen, Anna
    [J]. BIOINFORMATICS, 2017, 33 (24) : 3973 - 3981
  • [8] Automatic semantic classification of scientific literature according to the hallmarks of cancer
    Baker, Simon
    Silins, Ilona
    Guo, Yufan
    Ali, Imran
    Hogberg, Johan
    Stenius, Ulla
    Korhonen, Anna
    [J]. BIOINFORMATICS, 2016, 32 (03) : 432 - 440
  • [9] Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
  • [10] The Unified Medical Language System (UMLS): integrating biomedical terminology
    Bodenreider, O
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D267 - D270