GA strategy for variable selection in QSAR studies: GAPLS and D-optimal designs for predictive QSAR model

被引:43
作者
Hasegawa, K
Funatsu, K
机构
[1] Toyohashi Univ Technol, Dept Knowledge Based Informat Engn, Toyohashi, Aichi 441, Japan
[2] Kowa Co Ltd, Tokyo Res Labs, Tokyo 189, Japan
来源
THEOCHEM-JOURNAL OF MOLECULAR STRUCTURE | 1998年 / 425卷 / 03期
关键词
variable selection; genetic algorithm; partial least squares (PLS); GAPLS; benzodiazepine; D-optimal designs;
D O I
10.1016/S0166-1280(97)00205-4
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Variable selection is an important task in obtaining an interpretable and predictive model. In general, variable space has a complex landscape with many local solutions and. therefore, classical optimization techniques cannot be applied. The genetic algorithm (GA) is a recently developed optimization technique that has attracted much attention in several scientific fields. In a previous study (K. Hasegawa et al., J. Chem. Inf. Comput. Sci., 37 (1997) 306), we proposed a novel approach (GAPLS; GA-based PLS) in which variables are selected by GA and partial least squares (PLS). The purpose of this study is to examine whether GAPLS works well for a data set with a medium size of variables. The binding affinities of ligands to benzodiazepine/ GABA(A) receptor were used as a test example. The relationship between 42 physico-chemical parameters at six positions on 57 benzodiazepines (BZs) and their binding affinities has been investigated by GAPLS. The best PLS model with the selected variables was significantly more predictive than the one with all variables, and the structural requirements for the receptor binding affinity could be estimated from the selected variables. (C) 1998 Elsevier Science B.V.
引用
收藏
页码:255 / 262
页数:8
相关论文
共 24 条
[1]   GENERATING OPTIMAL LINEAR PLS ESTIMATIONS (GOLPE) - AN ADVANCED CHEMOMETRIC TOOL FOR HANDLING 3D-QSAR PROBLEMS [J].
BARONI, M ;
COSTANTINO, G ;
CRUCIANI, G ;
RIGANELLI, D ;
VALIGI, R ;
CLEMENTI, S .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1993, 12 (01) :9-20
[2]   SAMPLE-DISTANCE PARTIAL LEAST-SQUARES - PLS OPTIMIZED FOR MANY VARIABLES, WITH APPLICATION TO COMFA [J].
BUSH, BL ;
NACHBAR, RB .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1993, 7 (05) :587-619
[3]  
COCCHI M, 1995, THEOCHEM-J MOL STRUC, V331, P79
[4]   COMPARATIVE MOLECULAR-FIELD ANALYSIS (COMFA) .1. EFFECT OF SHAPE ON BINDING OF STEROIDS TO CARRIER PROTEINS [J].
CRAMER, RD ;
PATTERSON, DE ;
BUNCE, JD .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1988, 110 (18) :5959-5967
[5]   D-optimal designs [J].
deAguiar, PF ;
Bourguignon, B ;
Khots, MS ;
Massart, DL ;
PhanThanLuu, R .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1995, 30 (02) :199-210
[6]  
Draper N. R., 1966, APPL REGRESSION ANAL
[8]  
Goldberg D., 1989, GENETIC ALGORITHMS S
[9]   RHO-SIGMA-PI ANALYSIS . METHOD FOR CORRELATION OF BIOLOGICAL ACTIVITY + CHEMICAL STRUCTURE [J].
HANSCH, C ;
FUJITA, T .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1964, 86 (08) :1616-&
[10]  
Hansch C., 1995, Exploring QSAR-Fundamentals and Applications in Chemistry and Biology