A Novel Automated Framework for QSAR Modeling of Highly Imbalanced Leishmania High-Throughput Screening Data

被引:16
作者
Casanova-Alvarez, Omar [1 ]
Morales-Helguera, Aliuska [2 ]
Angel Cabrera-Perez, Miguel [2 ]
Molina-Ruiz, Reinaldo [2 ]
Molina, Christophe [3 ]
机构
[1] Univ Cent Marta Abreu Las Villas, Dept Quim, Fac Quim Farm, Santa Clara 54830, Villa Clara, Cuba
[2] Univ Cent Marta Abreu Las Villas, Ctr Bioact Quim, Santa Clara 54830, Villa Clara, Cuba
[3] PIKAIROS SA, F-31650 St Orens De Gameville, France
关键词
LIPOSOMAL AMPHOTERICIN-B; CUTANEOUS LEISHMANIASIS; FEATURE-SELECTION; CLASSIFICATION; METRONIDAZOLE; MILTEFOSINE; INFORMATION; INEFFICACY; ENSEMBLES; DISCOVERY;
D O I
10.1021/acs.jcim.0c01439
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
In silico prediction of antileishmanial activity using quantitative structure-activity relationship (QSAR) models has been developed on limited and small datasets. Nowadays, the availability of large and diverse high-throughput screening data provides an opportunity to the scientific community to model this activity from the chemical structure. In this study, we present the first KNIME automated workflow to modeling a large, diverse, and highly imbalanced dataset of compounds with antileishmanial activity. Because the data is strongly biased toward inactive compounds, a novel strategy was implemented based on the selection of different balanced training sets and a further consensus model using single decision trees as the base model and three criteria for output combinations. The decision tree consensus was adopted after comparing its classification performance to consensuses built upon Gaussian-Naive-Bayes, Support-Vector-Machine, Random-Forest, Gradient-Boost, and Multi-LayerPerceptron base models. All these consensuses were rigorously validated using internal and external test validation sets and were compared against each other using Friedman and Bonferroni-Dunn statistics. For the retained decision tree-based consensus model, which covers 100% of the chemical space of the dataset and with the lowest consensus level, the overall accuracy statistics for test and external sets were between 71 and 74% and 71 and 76%, respectively, while for a reduced chemical space (21%) and with an incremental consensus level, the accuracy statistics were substantially improved with values for the test and external sets between 86 and 92% and 88 and 92%, respectively. These results highlight the relevance of the consensus model to prioritize a relatively small set of active compounds with high prediction sensitivity using the Incremental Consensus at high level values or to predict as many compounds as possible, lowering the level of Incremental Consensus. Finally, the workflow developed eliminates human bias, improves the procedure reproducibility, and allows other researchers to reproduce our design and use it in their own QSAR problems.
引用
收藏
页码:3213 / 3231
页数:19
相关论文
共 67 条
[1]  
Al-Waiz MM, 2004, SAUDI MED J, V25, P1512
[2]   Computational Identification of Chemical Compounds with Potential Activity against Leishmania amazonensis using Nonlinear Machine Learning Techniques [J].
Alberto Castillo-Garit, Juan ;
Flores-Balmaseda, Naivi ;
Alvarez, Orlando ;
Hai Pham-The ;
Perez-Donate, Virginia ;
Torrens, Francisco ;
Perez-Gimenez, Facundo .
CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2018, 18 (27) :2347-2354
[3]  
Alvascience, 2020, ALVADESC SOFTW MOL D
[4]   Metabolic Clustering Analysis as a Strategy for Compound Selection in the Drug Discovery Pipeline for Leishmaniasis [J].
Armitage, Emily G. ;
Godzien, Joanna ;
Pena, Imanol ;
Lopez-Gonzalvez, Angeles ;
Angulo, Santiago ;
Gradillas, Ana ;
Alonso-Herranz, Vanesa ;
Martin, Julio ;
Fiandor, Jose M. ;
Barrett, Michael P. ;
Gabarro, Raquel ;
Barbas, Coral .
ACS CHEMICAL BIOLOGY, 2018, 13 (05) :1361-1369
[5]   Liposomal amphotericin B for the treatment of visceral leishmaniasis [J].
Bern, Caryn ;
Adler-Moore, Jill ;
Berenguer, Juan ;
Boelaert, Marleen ;
den Boer, Margriet ;
Davidson, Robert N. ;
Figueras, Concepcion ;
Gradoni, Luigi ;
Kafetzis, Dimitris A. ;
Ritmeijer, Koert ;
Rosenthal, Eric ;
Royce, Catherine ;
Russo, Rosario ;
Sundar, Shyam ;
Alvar, Jorge .
CLINICAL INFECTIOUS DISEASES, 2006, 43 (07) :917-924
[6]   A Comprehensive QSAR Study on Antileishmanial and Antitrypanosomal Cinnamate Ester Analogues [J].
Bernal, Freddy A. ;
Schmidt, Thomas J. .
MOLECULES, 2019, 24 (23)
[7]   KNIME:: The Konstanz Information Miner [J].
Berthold, Michael R. ;
Cebron, Nicolas ;
Dill, Fabian ;
Gabriel, Thomas R. ;
Koetter, Tobias ;
Meinl, Thorsten ;
Ohl, Peter ;
Sieb, Christoph ;
Thiel, Kilian ;
Wiswedel, Bernd .
DATA ANALYSIS, MACHINE LEARNING AND APPLICATIONS, 2008, :319-326
[8]   Classifier Ensemble Based on Feature Selection and Diversity Measures for Predicting the Affinity of A2B Adenosine Receptor Antagonists [J].
Bonet, Isis ;
Franco-Montero, Pedro ;
Rivero, Virginia ;
Teijeira, Marta ;
Borges, Fernanda ;
Uriarte, Eugenio ;
Morales Helguera, Aliuska .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (12) :3140-3155
[9]   Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric [J].
Boughorbel, Sabri ;
Jarray, Fethi ;
El-Anbari, Mohammed .
PLOS ONE, 2017, 12 (06)
[10]   Multi-target drugs active against leishmaniasis: A paradigm of drug repurposing [J].
Braga, Susana Santos .
EUROPEAN JOURNAL OF MEDICINAL CHEMISTRY, 2019, 183