Understanding the performance of knowledge graph embeddings in drug discovery

被引：34

作者：

Bonner, Stephen ^{[1
]}

Barrett, Ian P. ^{[1
]}

Ye, Cheng ^{[1
]}

Swiers, Rowan ^{[1
]}

Engkvist, Ola ^{[2
]}

Hoyt, Charles Tapley ^{[3
]}

Hamilton, William L. ^{[4
,5
]}

机构：

[1] AstraZeneca, Data Sci & Quantitat Biol, Discovery Sci, Cambridge, England

[2] AstraZeneca, Mol AI, Discovery Sci, R&D, Gothenburg, Sweden

[3] Harvard Med Sch, Lab Syst Pharmacol, Boston, MA USA

[4] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada

[5] Mila Quebec AI Inst, Montreal, PQ, Canada

来源：

ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES | 2022年 / 2卷

关键词：

Drug discovery; Knowledge graph embedding; Knowledge grahps;

D O I：

10.1016/j.ailsci.2022.100036

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab- based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required. In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.

引用

页数：12

共 47 条

[1] Optuna: A Next-generation Hyperparameter Optimization Framework [J].

Akiba, Takuya ;

Sano, Shotaro ;

Yanase, Toshihiko ;

Ohta, Takeru ;

Koyama, Masanori .

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631

[2] Bringing Light Into the Dark: A Large-Scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework [J].

Ali, Mehdi ;

Berrendorf, Max ;

Hoyt, Charles Tapley ;

Vermue, Laurent ;

Galkin, Mikhail ;

Sharifzadeh, Sahand ;

Fischer, Asja ;

Tresp, Volker ;

Lehmann, Jens .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) :8825-8845

[3]

Ali M, 2021, J MACH LEARN RES, V22

[4]

Bergstra J., 2011, Adv. Neural Inf. Process. Syst., P2546

[5]

Bergstra J, 2012, J MACH LEARN RES, V13, P281

[6]

Berrendorf M, 2022, Arxiv, DOI arXiv:2002.06914

[7]

Bonner S, 2021, Arxiv, DOI arXiv:2102.10062

[8]

Bordes A., 2013, NIPS'13, P1

[9] OpenBioLink: a benchmarking framework for large-scale biomedical link prediction [J].

Breit, Anna ;

Ott, Simon ;

Agibetov, Asan ;

Samwald, Matthias .

BIOINFORMATICS, 2020, 36 (13) :4097-4098

[10] Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction in realistic settings [J].

Celebi, Remzi ;

Uyar, Huseyin ;

Yasar, Erkan ;

Gumus, Ozgur ;

Dikenelli, Oguz ;

Dumontier, Michel .

BMC BIOINFORMATICS, 2019, 20 (01)

← 1 2 3 4 5 →