How Wrong Can We Get? A Review of Machine Learning Approaches and Error Bars

被引:21
作者
Schwaighofer, Anton [2 ]
Schroeter, Timon [1 ]
Mika, Sebastian [3 ]
Blanchard, Gilles [2 ]
机构
[1] Tech Univ Berlin, Dept Comp Sci, D-10587 Berlin, Germany
[2] Fraunhofer FIRST, D-12489 Berlin, Germany
[3] Idalab GmbH, D-10178 Berlin, Germany
关键词
Machine learning; error bars; model building; parameter estimation; decision tree; support vector machine; Gaussian process; SUPPORT VECTOR MACHINES; AQUEOUS SOLUBILITY; ORGANIC-COMPOUNDS; FEATURE-SELECTION; PREDICTION; KERNELS; MODELS; CLASSIFICATION; LIPOPHILICITY; SIMILARITY;
D O I
10.2174/138620709788489064
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A large number of different machine learning methods can potentially be used for ligand-based virtual screening. In our contribution, we focus on three specific nonlinear methods, namely support vector regression, Gaussian process models, and decision trees. For each of these methods, we provide a short and intuitive introduction. In particular, we will also discuss how confidence estimates (error bars) can be obtained from these methods. We continue with important aspects for model building and evaluation, such as methodologies for model selection, evaluation, performance criteria, and how the quality of error bar estimates can be verified. Besides an introduction to the respective methods, we will also point to available implementations, and discuss important issues for the practical application.
引用
收藏
页码:453 / 468
页数:16
相关论文
共 59 条
  • [41] Machine learning models for lipophilicity and their domain of applicability
    Schroeter, Timon
    Schwaighofer, Anton
    Mika, Sebastian
    Ter Laak, Antonius
    Suelzle, Detlev
    Ganzer, Ursula
    Heinrich, Nikolaus
    Mueller, Klaus-Robert
    [J]. MOLECULAR PHARMACEUTICS, 2007, 4 (04) : 524 - 538
  • [42] Predicting lipophilicity of drug-discovery molecules using Gaussian process models
    Schroeter, Timon S.
    Schwaighofer, Anton
    Mika, Sebastian
    Ter Laak, Antonius
    Suelzle, Detlev
    Ganzer, Ursula
    Heinrich, Nikolaus
    Mueller, Klaus-Robert
    [J]. CHEMMEDCHEM, 2007, 2 (09) : 1265 - 1267
  • [43] Estimating the domain of applicability for machine learning QSAR models:: a study on aqueous solubility of drug discovery molecules
    Schroeter, Timon Sebastian
    Schwaighofer, Anton
    Mika, Sebastian
    Ter Laak, Antonius
    Suelzle, Detlev
    Ganzer, Ursula
    Heinrich, Nikolaus
    Mueller, Klaus-Robert
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2007, 21 (12) : 651 - 664
  • [44] A Probabilistic approach to classifying metabolic stability
    Schwaighofer, Anton
    Schroeter, Timon
    Mika, Sebastian
    Hansen, Katja
    ter Laak, Antonius
    Lienau, Philip
    Reichel, Andreas
    Heinrich, Nikolaus
    Mueller, Klaus-Robert
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (04) : 785 - 796
  • [45] Accurate solubility prediction with error bars for electrolytes:: A machine learning approach
    Schwaighofer, Anton
    Schroeter, Timon
    Mika, Sebastian
    Laub, Julian
    ter Laak, Antonius
    Suelzle, Detlev
    Ganzer, Ursula
    Heinrich, Nikolaus
    Mueller, Klaus-Robert
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (02) : 407 - 424
  • [46] Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR
    Sheridan, RP
    Feuston, BP
    Maiorov, VN
    Kearsley, SK
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (06): : 1912 - 1928
  • [47] Silverman B. W., 1986, Density estimation for statistics and data analysis (Monographs on Statistics and Applied Probability Series), DOI 10.1201/9781315140919
  • [48] A tutorial on support vector regression
    Smola, AJ
    Schölkopf, B
    [J]. STATISTICS AND COMPUTING, 2004, 14 (03) : 199 - 222
  • [49] Sonnenburg Soren., 2007, Large Scale Kernel Machines, P73
  • [50] Can we estimate the accuracy of ADME-Tox predictions?
    Tetko, Igor V.
    Bruneau, Pierre
    Mewes, Hans-Werner
    Rohrer, Douglas C.
    Poda, Gennadiy I.
    [J]. DRUG DISCOVERY TODAY, 2006, 11 (15-16) : 700 - 707