Large-scale comparison of machine learning methods for drug target prediction on ChEMBL

被引:326
|
作者
Mayr, Andreas [1 ,2 ]
Klambauer, Guenter [1 ,2 ]
Unterthiner, Thomas [1 ,2 ]
Steijaert, Marvin [3 ]
Wegner, Jorg K. [4 ]
Ceulemans, Hugo [4 ]
Clevert, Djork-Arne [5 ]
Hochreiter, Sepp [1 ,2 ]
机构
[1] Johannes Kepler Univ Linz, LIT AI Lab, Linz, Austria
[2] Johannes Kepler Univ Linz, Inst Bioinformat, Linz, Austria
[3] Open Analyt NV, Antwerp, Belgium
[4] Janssen Pharmaceut NV, Beerse, Belgium
[5] Bayer AG, Leverkusen, Germany
关键词
DEEP NEURAL-NETWORKS; MODELS; CLASSIFICATION; VALIDATION; DESIGN;
D O I
10.1039/c8sc00148k
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Deep learning is currently the most successful machine learning technique in a wide range of application areas and has recently been applied successfully in drug discovery research to predict potential drug targets and to screen for active molecules. However, due to (1) the lack of large-scale studies, (2) the compound series bias that is characteristic of drug discovery datasets and (3) the hyperparameter selection bias that comes with the high number of potential deep learning architectures, it remains unclear whether deep learning can indeed outperform existing computational methods in drug discovery tasks. We therefore assessed the performance of several deep learning methods on a large-scale drug discovery dataset and compared the results with those of other machine learning and target prediction methods. To avoid potential biases from hyperparameter selection or compound series, we used a nested cluster-cross-validation strategy. We found (1) that deep learning methods significantly outperform all competing methods and (2) that the predictive performance of deep learning is in many cases comparable to that of tests performed in wet labs (i.e., in vitro assays).
引用
收藏
页码:5441 / 5451
页数:11
相关论文
共 50 条
  • [1] Large-scale comparison of machine learning algorithms for target prediction of natural products
    Liang, Lu
    Liu, Ye
    Kang, Bo
    Wang, Ru
    Sun, Meng-Yu
    Wu, Qi
    Meng, Xiang-Fei
    Lin, Jian-Ping
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [2] Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
    Jiangxia Wu
    Yihao Chen
    Jingxing Wu
    Duancheng Zhao
    Jindi Huang
    MuJie Lin
    Ling Wang
    Journal of Cheminformatics, 16
  • [3] Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
    Wu, Jiangxia
    Chen, Yihao
    Wu, Jingxing
    Zhao, Duancheng
    Huang, Jindi
    Lin, Mujie
    Wang, Ling
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [4] ChEMBL: a large-scale bioactivity database for drug discovery
    Gaulton, Anna
    Bellis, Louisa J.
    Bento, A. Patricia
    Chambers, Jon
    Davies, Mark
    Hersey, Anne
    Light, Yvonne
    McGlinchey, Shaun
    Michalovich, David
    Al-Lazikani, Bissan
    Overington, John P.
    NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D1100 - D1107
  • [5] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [6] Large-scale prediction of drug-target relationships
    Kuhn, Michael
    Campillos, Monica
    Gonzalez, Paula
    Jensen, Lars Juhl
    Bork, Peer
    FEBS LETTERS, 2008, 582 (08) : 1283 - 1290
  • [7] Large-Scale Machine Learning for Business Sector Prediction
    Angenent, Mitch N.
    Barata, Antonio Pereira
    Takes, Frank W.
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1143 - 1146
  • [8] A review of Nystrom methods for large-scale machine learning
    Sun, Shiliang
    Zhao, Jing
    Zhu, Jiang
    INFORMATION FUSION, 2015, 26 : 36 - 48
  • [9] Evaluation of Machine Learning Methods on Large-Scale Spatiotemporal Data for Photovoltaic Power Prediction
    Sauter, Evan
    Mughal, Maqsood
    Zhang, Ziming
    ENERGIES, 2023, 16 (13)
  • [10] Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction
    Matthew C. Robinson
    Robert C. Glen
    Alpha A. Lee
    Journal of Computer-Aided Molecular Design, 2020, 34 : 717 - 730