Supervised self-organizing maps in drug discovery. 1. Robust behavior with overdetermined data sets

被引:30
作者
Xiao, YD
Clauset, A
Harris, R
Bayram, E
Santago, P
Schmitt, JD
机构
[1] Targacept Inc, Mol Design Grp, Winston Salem, NC 27101 USA
[2] Univ New Mexico, Dept Comp Sci, Albuquerque, NM 87131 USA
[3] Wake Forest Univ, Sch Biomed Engn & Sci, Virginia Tech, Winston Salem, NC 27157 USA
关键词
D O I
10.1021/ci0500839
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The utility of the supervised Kohonen self-organizing map was assessed and compared to several statistical methods used in QSAR analysis. The self-organizing map (SOM) describes a family otnonlinear, topology preserving mapping methods with attributes of both vector quantization and clustering that provides Visualization options unavailable with other nonlinear methods. In contrast to most chemometric methods, the supervised SOM (sSOM) is shown to be relatively insensitive to noise and feature redundancy. Additionally, sSOMs can make use of descriptors having only nominal linear correlation with the tar et property. Results herein are contrasted to partial least squares, stepwise Multiple linear regression, the genetic functional algorithm, and genetic partial least Squares, collectively referred to throughout as the "standard methods". The k-nearest neighbor (kNN) classification method was also performed to provide a direct comparison with a different classification method. The widely studied dihydrofolate reductase (DHFR) inhibition data set of Hansch and Silipo is Used to evaluate the ability of sSOMs to classify unknowns as a function of increasing class resolution. The contribution of the sSOM neighborhood kernel to its predictive ability is assessed in two experiments: ( 1) training with the k-rneans Clustering limit, where the neighborhood radius is zero throughout the training regimen, and (2) training the sSOM until the neighborhood radius is reduced to zero. Results demonstrate that sSOMs provide more accurate predictions than standard linear QSAR methods.
引用
收藏
页码:1749 / 1758
页数:10
相关论文
共 28 条
[1]  
*ACC INC, 2003, CER 2 MOD ENV REL 4
[2]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[3]  
[Anonymous], 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
[4]  
ANZALI S, 1998, USE SELF ORGANIZING, P273
[5]   Genetic algorithms and self-organizing maps: a powerful combination for modeling complex QSAR and QSPR problems [J].
Bayram, E ;
Santago, P ;
Harris, R ;
Xiao, YD ;
Clauset, AJ ;
Schmitt, JD .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2004, 18 (7-9) :483-493
[6]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[7]  
Draper NR, 1966, APPL REGRESSION ANAL, P407
[8]   An integrated SOM-fuzzy ARTMAP neural system for the evaluation of toxicity [J].
Espinosa, G ;
Arenas, A ;
Giralt, F .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (02) :343-359
[9]   NEURAL NETWORKS IN CHEMISTRY [J].
GASTEIGER, J ;
ZUPAN, J .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION IN ENGLISH, 1993, 32 (04) :503-527
[10]   PARTIAL LEAST-SQUARES REGRESSION - A TUTORIAL [J].
GELADI, P ;
KOWALSKI, BR .
ANALYTICA CHIMICA ACTA, 1986, 185 :1-17