Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery

被引:104
作者
Bosc, Nicolas [1 ]
Atkinson, Francis [1 ]
Felix, Eloy [1 ]
Gaulton, Anna [1 ]
Hersey, Anne [1 ]
Leach, Andrew R. [1 ]
机构
[1] European Bioinformat Inst EMBL EBI, Chemogen Team, Wellcome Genome Campus, Cambridge CB10 1SD, England
基金
英国惠康基金; 欧盟地平线“2020”;
关键词
QSAR; Mondrian conformal prediction; ChEMBL; Classification models; Cheminformatics; APPLICABILITY DOMAIN; CLASSIFICATION; DATABASE; CHEMICALS; DESIGN;
D O I
10.1186/s13321-018-0325-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Structure-activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively new QSAR approach that provides information on the certainty of a prediction, and so helps in decision-making. However, it is not always clear how best to make use of this additional information. In this article, we describe a case study that directly compares conformal prediction with traditional QSAR methods for large-scale predictions of target-ligand binding. The ChEMBL database was used to extract a data set comprising data from 550 human protein targets with different bioactivity profiles. For each target, a QSAR model and a conformal predictor were trained and their results compared. The models were then evaluated on new data published since the original models were built to simulate a real world application. The comparative study highlights the similarities between the two techniques but also some differences that it is important to bear in mind when the methods are used in practical drug discovery applications.
引用
收藏
页数:16
相关论文
共 49 条
  • [41] Comparison of Different 2D and 3D-QSAR Methods on Activity Prediction of Histamine H3 Receptor Antagonists
    Dastmalchi, Siavoush
    Hamzeh-Mivehroud, Maryam
    Asadpour-Zeynali, Karim
    IRANIAN JOURNAL OF PHARMACEUTICAL RESEARCH, 2012, 11 (01): : 97 - 108
  • [42] Refined sgRNA efficacy prediction improves large- and small-scale CRISPR-Cas9 applications
    Labuhn, Maurice
    Adams, Felix F.
    Ng, Michelle
    Knoess, Sabine
    Schambach, Axel
    Charpentier, Emmanuelle M.
    Schwarzer, Adrian
    Mateo, Juan L.
    Klusmann, Jan-Henning
    Heckl, Dirk
    NUCLEIC ACIDS RESEARCH, 2018, 46 (03) : 1375 - 1385
  • [43] A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries
    Li, Bo
    Lu, Yijuan
    Li, Chunyuan
    Godil, Afzal
    Schreck, Tobias
    Aono, Masaki
    Burtscher, Martin
    Chen, Qiang
    Chowdhury, Nihad Karim
    Fang, Bin
    Fu, Hongbo
    Furuya, Takahiko
    Li, Haisheng
    Liu, Jianzhuang
    Johan, Henry
    Kosaka, Ryuichi
    Koyanagi, Hitoshi
    Ohbuchi, Ryutarou
    Tatsuma, Atsushi
    Wan, Yajuan
    Zhang, Chaoli
    Zou, Changqing
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 131 : 1 - 27
  • [44] BNEMDI: A Novel MicroRNA-Drug Interaction Prediction Model Based on Multi-Source Information With a Large-Scale Biological Network
    Guan, Yong-Jian
    Yu, Chang-Qing
    Li, Li-Ping
    You, Zhu-Hong
    Ren, Zhong-Hao
    Pan, Jie
    Li, Yue-Chao
    FRONTIERS IN GENETICS, 2022, 13
  • [45] Comparison of zero-sequence injection methods in cascaded H-bridge multilevel converters for large-scale photovoltaic integration
    Yu, Yifan
    Konstantinou, Georgios
    Townsend, Christopher D.
    Agelidis, Vassilios G.
    IET RENEWABLE POWER GENERATION, 2017, 11 (05) : 603 - 613
  • [46] Dynameomics: Data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction
    Rysavy, Steven J.
    Beck, David A. C.
    Daggett, Valerie
    PROTEIN SCIENCE, 2014, 23 (11) : 1584 - 1595
  • [47] Simulation-based optimization of toll pricing in large-scale urban networks using the network fundamental diagram: A cross-comparison of methods
    Gu, Ziyuan
    Saberi, Meead
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 122
  • [48] Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery
    Dudzic, Pawel
    Chomicz, Dawid
    Konczak, Jaroslaw
    Satlawa, Tadeusz
    Janusz, Bartosz
    Wrobel, Sonia
    Gawlowski, Tomasz
    Jaszczyszyn, Igor
    Bielska, Weronika
    Demharter, Samuel
    Spreafico, Roberto
    Schulte, Lukas
    Martin, Kyle
    Comeau, Stephen R.
    Krawczyk, Konrad
    MABS, 2024, 16 (01)
  • [49] In Silico Prediction of Human Clinical Pharmacokinetics with ANDROMEDA by Prosilico: Predictions for an Established Benchmarking Data Set, a Modern Small Drug Data Set, and a Comparison with Laboratory Methods
    Fagerholm, Urban
    Hellberg, Sven
    Alvarsson, Jonathan
    Spjuth, Ola
    ATLA-ALTERNATIVES TO LABORATORY ANIMALS, 2023, 51 (01): : 39 - 54