Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds

被引:164
作者
Helma, C
Cramer, T
Kramer, S
De Raedt, L
机构
[1] Univ Freiburg, Machine Learning Lab, Inst Comp Sci, D-79110 Freiburg, Germany
[2] Tech Univ Munich, Inst Comp Sci, D-85748 Garching, Germany
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2004年 / 44卷 / 04期
关键词
D O I
10.1021/ci034254q
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This paper explores the utility of data mining and machine learning algorithms for the induction of mutagenicity structure-activity relationships (SARs) from noncongeneric data sets. We compare (i) a newly developed algorithm (MOLFEA) for the generation of descriptors (molecular fragments) for noncongeneric compounds with traditional SAR approaches (molecular properties) and (ii) different machine learning algorithms for the induction of SARs from these descriptors. In addition we investigate the optimal parameter settings for these programs and give an exemplary interpretation of the derived models. The predictive accuracies of models using MOLFEA derived descriptors is similar to10- 15 %age points higher than those using molecular properties alone. Using both types of descriptors together does not improve the derived models. From the applied machine learning techniques the rule learner PART and support vector machines gave the best results, although the differences between the learning algorithms are only marginal. We were able to achieve predictive accuracies up to 78% for 10-fold cross-validation. The resulting models are relatively easy to interpret and usable for predictive as well as for explanatory purposes.
引用
收藏
页码:1402 / 1411
页数:10
相关论文
共 26 条
[1]   CARCINOGENS ARE MUTAGENS - SIMPLE TEST SYSTEM COMBINING LIVER HOMOGENATES FOR ACTIVATION AND BACTERIA FOR DETECTION [J].
AMES, BN ;
DURSTON, WE ;
YAMASAKI, E ;
LEE, FD .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1973, 70 (08) :2281-2285
[2]  
[Anonymous], 1998, P 15 INT C MACH LEAR
[3]   THE INFLUENCE OF CHEMICAL-STRUCTURE ON THE EXTENT AND SITES OF CARCINOGENESIS FOR 522 RODENT CARCINOGENS AND 55 DIFFERENT HUMAN CARCINOGEN EXPOSURES [J].
ASHBY, J ;
PATON, D .
MUTATION RESEARCH, 1993, 286 (01) :3-74
[4]   COMPUTER-ASSISTED ANALYSIS OF INTERLABORATORY AMES TEST VARIABILITY [J].
BENIGNI, R ;
GIULIANI, A .
JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH, 1988, 25 (01) :135-148
[5]  
Gasteiger J., 1990, TetrahedronComput. Methodol, V3, P537, DOI DOI 10.1016/0898-5529(90)90156-3
[6]  
GOLD LS, 1997, HDB CARCINOGENIC POT
[7]   Knowledge discovery and data mining in toxicology [J].
Helma, C ;
Gottmann, E ;
Kramer, S .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2000, 9 (04) :329-358
[8]  
HELMA C, 2003, P BEILST WORKSH 2002
[9]  
HILL A, 2002, THESIS U FREIBURG
[10]   TESTING BY ARTIFICIAL-INTELLIGENCE - COMPUTATIONAL ALTERNATIVES TO THE DETERMINATION OF MUTAGENICITY [J].
KLOPMAN, G ;
ROSENKRANZ, HS .
MUTATION RESEARCH, 1992, 272 (01) :59-71