Automated Framework for Developing Predictive Machine Learning Models for Data-Driven Drug Discovery

被引:6
作者
Neves, Bruno J. [1 ,2 ]
Moreira-Filho, Jose T. [2 ]
Silva, Arthur C. [2 ]
Borba, Joyce V. V. B. [2 ]
Mottin, Melina [2 ]
Alves, Vinicius M. [3 ]
Braga, Rodolpho C. [4 ]
Muratov, Eugene N. [2 ,3 ,5 ]
Andrade, Carolina H. [2 ]
机构
[1] Ctr Univ Anapolis UniEVANGELICA, Lab Quimioinformat LabChem, BR-75083515 Anapolis, Go, Brazil
[2] Univ Fed Goias, Lab Planejamento Farmacos & Modelagem Mol LabMol, Fac Farm, BR-74605170 Goiania, Go, Brazil
[3] Univ N Carolina, UNC Eshelman Sch Pharm, Lab Mol Modeling, Chapel Hill, NC 27955 USA
[4] InsilicAll Ltda, BR-04363090 Sao Paulo, SP, Brazil
[5] Univ Fed Paraiba, Dept Ciencias Farmaceut, BR-58059900 Joao Pessoa, PB, Brazil
基金
巴西圣保罗研究基金会;
关键词
drug discovery; KNIME; predictive modeling; machine learning; virtual screening; QSAR; INTEGRATION; CHEMISTRY; CURATION; VERIFY; KNIME; TRUST; SETS;
D O I
10.21577/0103-5053.20200160
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The increasing availability of extensive collections of chemical compounds associated with experimental data provides an opportunity to build predictive quantitative structure-activity relationship (QSAR) models using machine learning (ML) algorithms. These models can promote data-driven decisions and have the potential to speed up the drug discovery process and reduce their failure rates. However, many essential aspects of data preparation and modeling are not available in any standalone program. Here, we developed an automated framework for the curation of chemogenomics data and to develop QSAR models for virtual screening using the open-source KoNstanz Information MinEr (KNIME) program. The workflow includes four modules: (i) dataset preparation and curation; (ii) chemical space analysis and structure-activity relationships (SAR) rules; (iii) modeling; and (iv) virtual screening (VS). As case studies, we applied these workflows to four datasets associated with different endpoints. The implemented protocol can efficiently curate chemical and biological data in public databases and generates robust QSAR models. We provide scientists a simple and guided cheminformatics workbench following the best practices widely accepted by the community, in which scientists can adapt to solve their research problems. The workflows are freely available for download at GitHub and LabMol web portals.
引用
收藏
页码:110 / 122
页数:13
相关论文
共 51 条
[1]   Evolutionary Computation and QSAR Research [J].
Aguiar-Pulido, Vanessa ;
Gestal, Marcos ;
Cruz-Monteagudo, Maykel ;
Rabunal, Juan R. ;
Dorado, Julian ;
Munteanu, Cristian R. .
CURRENT COMPUTER-AIDED DRUG DESIGN, 2013, 9 (02) :206-225
[2]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[3]   Multi-Descriptor Read Across (MuDRA): A Simple and Transparent Approach for Developing Accurate Quantitative Structure-Activity Relationship Models [J].
Alves, Vinicius M. ;
Golbraikh, Alexander ;
Capuzzi, Stephen J. ;
Liu, Kammy ;
Lam, Wai In ;
Korn, Daniel Robert ;
Pozefsky, Diane ;
Andrade, Carolina Horta ;
Muratov, Eugene N. ;
Tropsha, Alexander .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (06) :1214-1223
[4]   Development of Web and Mobile Applications for Chemical Toxicity Prediction [J].
Alves, Vinicius M. ;
Braga, Rodolpho C. ;
Muratov, Eugene ;
Andrade, Carolina H. .
JOURNAL OF THE BRAZILIAN CHEMICAL SOCIETY, 2018, 29 (05) :982-988
[5]   New Workflow for QSAR Model Development from Small Data Sets: Small Dataset Curator and Small Dataset Modeler. Integration of Data Curation, Exhaustive Double Cross-Validation, and a Set of Optimal Model Selection Techniques [J].
Ambure, Pravin ;
Gajewicz-Skretna, Agnieszka ;
Cordeiro, M. Natalia D. S. ;
Roy, Kunal .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (10) :4070-4076
[6]   KNIME:: The Konstanz Information Miner [J].
Berthold, Michael R. ;
Cebron, Nicolas ;
Dill, Fabian ;
Gabriel, Thomas R. ;
Koetter, Tobias ;
Meinl, Thorsten ;
Ohl, Peter ;
Sieb, Christoph ;
Thiel, Kilian ;
Wiswedel, Bernd .
DATA ANALYSIS, MACHINE LEARNING AND APPLICATIONS, 2008, :319-326
[7]  
Bolton EE, 2010, ANN REP COMP CHEM, V4, P217, DOI 10.1016/S1574-1400(08)00012-1
[8]   Virtual Screening Strategies in Medicinal Chemistry: The State of the Art and Current Challenges [J].
Braga, Rodolpho C. ;
Alves, Vincius M. ;
Silva, Arthur C. ;
Nascimento, Marilia N. ;
Silva, Flavia C. ;
Liao, Luciano M. ;
Andrade, Carolina H. .
CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2014, 14 (16) :1899-1912
[9]   Assessing the Performance of 3D Pharmacophore Models in Virtual Screening: How Good are They? [J].
Braga, Rodolpho C. ;
Andrade, Carolina H. .
CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2013, 13 (09) :1127-1138
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32