How Wrong Can We Get? A Review of Machine Learning Approaches and Error Bars

被引：21

作者：

Schwaighofer, Anton ^{[2
]}

Schroeter, Timon ^{[1
]}

Mika, Sebastian ^{[3
]}

Blanchard, Gilles ^{[2
]}

机构：

[1] Tech Univ Berlin, Dept Comp Sci, D-10587 Berlin, Germany

[2] Fraunhofer FIRST, D-12489 Berlin, Germany

[3] Idalab GmbH, D-10178 Berlin, Germany

来源：

COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING | 2009年 / 12卷 / 05期

关键词：

Machine learning; error bars; model building; parameter estimation; decision tree; support vector machine; Gaussian process; SUPPORT VECTOR MACHINES; AQUEOUS SOLUBILITY; ORGANIC-COMPOUNDS; FEATURE-SELECTION; PREDICTION; KERNELS; MODELS; CLASSIFICATION; LIPOPHILICITY; SIMILARITY;

D O I：

10.2174/138620709788489064

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

A large number of different machine learning methods can potentially be used for ligand-based virtual screening. In our contribution, we focus on three specific nonlinear methods, namely support vector regression, Gaussian process models, and decision trees. For each of these methods, we provide a short and intuitive introduction. In particular, we will also discuss how confidence estimates (error bars) can be obtained from these methods. We continue with important aspects for model building and evaluation, such as methodologies for model selection, evaluation, performance criteria, and how the quality of error bar estimates can be verified. Besides an introduction to the respective methods, we will also point to available implementations, and discuss important issues for the practical application.

引用

页码：453 / 468

页数：16

共 59 条

[21] Hastie T., 2009, The elements of statistical learning: data mining, inference, and prediction, P9
[22] HONG, 2002, ENV HLTH PERSP, V110, P29
[23] Jebara T, 2004, J MACH LEARN RES, V5, P819
[24] Kless A, 2004, LECT NOTES ARTIF INT, V3303, P191
[25] Model selection based on structural similarity -: Method description and application to water solubility prediction
Kühne, R
Ebert, RU
Schüürmann, G
[J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (02) : 636 - 641
[26] LASKOV P, 2000, NIPS 12, P484
[27] A comparative study on feature selection methods for drug discovery
Liu, Y
[J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05): : 1823 - 1828
[28] SOME COMMENTS ON CP
MALLOWS, CL
[J]. TECHNOMETRICS, 1973, 15 (04) : 661 - 675
[29] A consensus neural network-based technique for discriminating soluble and poorly soluble compounds
Manallack, DT
Tehan, BG
Gancia, E
Hudson, BD
Ford, MG
Livingstone, DJ
Whitley, DC
Pitt, WR
[J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02): : 674 - 679
[30] Classifying 'drug-likeness' with kernel-based learning methods
Müller, KR
Rätsch, G
Sonnenburg, S
Mika, S
Grimm, M
Heinrich, N
[J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (02) : 249 - 253

← 1 2 3 4 5 6 →