Comprehensive evaluation of pure and hybrid collaborative filtering in drug repurposing

被引:0
作者
Reda, Clemence [1 ]
Vie, Jill-Jenn [2 ]
Wolkenhauer, Olaf [1 ,3 ,4 ]
机构
[1] Univ Rostock, Dept Syst Biol & Bioinformat, D-18051 Rostock, Germany
[2] INRIA Saclay, Soda Team, F-91120 Palaiseau, France
[3] Leibniz Inst Food Syst Biol, D-85354 Freising Weihenstephan, Germany
[4] Stellenbosch Inst Adv Study, Wallenberg Res Ctr, ZA-7602 Stellenbosch, South Africa
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
关键词
Drug repositioning; Drug repurposing; Collaborative filtering; Benchmark; Matrix factorization; FALSE DISCOVERY RATE; SIMILARITY; ONTOLOGY;
D O I
10.1038/s41598-025-85927-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Drug development is known to be a costly and time-consuming process, which is prone to high failure rates. Drug repurposing allows drug discovery by reusing already approved compounds. The outcomes of past clinical trials can be used to predict novel drug-disease associations by leveraging drug- and disease-related similarities. To tackle this classification problem, collaborative filtering with implicit feedback (and potentially additional data on drugs and diseases) has become popular. It can handle large imbalances between negative and positive known associations and known and unknown associations. However, properly evaluating the improvement over the state of the art is challenging, as there is no consensus approach to compare models. We propose a reproducible methodology for comparing collaborative filtering-based drug repurposing. We illustrate this method by comparing 11 models from the literature on eight diverse drug repurposing datasets. Based on this benchmark, we derive guidelines to ensure a fair and comprehensive evaluation of the performance of those models. In particular, an uncontrolled bias on unknown associations might lead to severe data leakage and a misestimation of the model's true performance. Moreover, in drug repurposing, the ability of a model to extrapolate beyond its training distribution is crucial and should also be assessed. Finally, we identified a subcategory of collaborative filtering that seems efficient and robust to distribution shifts. Benchmarks constitute an essential step towards increased reproducibility and more accessible development of competitive drug repurposing methods.
引用
收藏
页数:18
相关论文
共 72 条
[1]  
Agarwal A., 2023, 36 ANN C LEARNING TH, P3821
[2]  
Agarwal Rishabh, 2021, Advances in Neural Information Processing Systems, V34
[3]  
[Anonymous], 2018, fastai
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]  
Bekkar M., 2013, J Inf Eng Appl, V3, P27, DOI DOI 10.5121/IJDKP.2013.3402
[6]  
Bell RM, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P95
[7]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[8]  
Bethesda (MD): National Library of Medicine(US) N. C. f. B.I, 1988, National center for biotechnology information (NCBI)
[9]  
Bhaskar SB, 2017, INDIAN J ANAESTH, V61, P453, DOI 10.4103/ija.IJA_361_17
[10]   A standard database for drug repositioning [J].
Brown, Adam S. ;
Patel, Chirag J. .
SCIENTIFIC DATA, 2017, 4