Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier

被引:231
作者
Bender, A
Mussa, HY
Glen, RC
Reiling, S
机构
[1] Univ Cambridge, Dept Chem, Unilever Ctr Mol Sci Informat, Cambridge CB2 1EW, England
[2] Aventis Pharmaceut, Bridgewater, NJ 08807 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2004年 / 44卷 / 01期
关键词
D O I
10.1021/ci034207y
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A novel technique for similarity searching is introduced. Molecules are represented by atom environments, which are fed into an information-gain-based feature selection. A naive Bayesian classifier is then employed for compound classification. The new method is tested by its ability to retrieve five sets of active molecules seeded in the MDL Drug Data Report (MDDR). In comparison experiments, the algorithm outperforms all current retrieval methods assessed here using two- and three-dimensional descriptors and offers insight into the significance of structural components for binding.
引用
收藏
页码:170 / 178
页数:9
相关论文
共 32 条
[1]   Molecular similarity based on DOCK-generated fingerprints [J].
Briem, H ;
Kuntz, ID .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (17) :3401-3408
[2]   In vitro and in silico affinity fingerprints:: Finding similarities beyond structural classes [J].
Briem, H ;
Lessel, UF .
PERSPECTIVES IN DRUG DISCOVERY AND DESIGN, 2000, 20 (01) :231-244
[3]   VALIDATION OF THE GENERAL-PURPOSE TRIPOS 5.2 FORCE-FIELD [J].
CLARK, M ;
CRAMER, RD ;
VANOPDENBOSCH, N .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 1989, 10 (08) :982-1012
[4]   COMPARATIVE MOLECULAR-FIELD ANALYSIS (COMFA) .1. EFFECT OF SHAPE ON BINDING OF STEROIDS TO CARRIER PROTEINS [J].
CRAMER, RD ;
PATTERSON, DE ;
BUNCE, JD .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1988, 110 (18) :5959-5967
[5]   DESIGN OF POTENT COMPETITIVE INHIBITORS OF ANGIOTENSIN-CONVERTING ENZYME - CARBOXYALKANOYL AND MERCAPTOALKANOYL AMINO-ACIDS [J].
CUSHMAN, DW ;
CHEUNG, HS ;
SABO, EF ;
ONDETTI, MA .
BIOCHEMISTRY, 1977, 16 (25) :5484-5491
[6]  
*DAYLIGHT INC, 1999, DAYL VERS 4 62
[7]   SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA [J].
DOWNS, GM ;
WILLETT, P ;
FISANICK, W .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (05) :1094-1102
[8]   Recent advances on the role of topological indices in drug discovery research [J].
Estrada, E ;
Uriarte, E .
CURRENT MEDICINAL CHEMISTRY, 2001, 8 (13) :1573-1588
[9]   The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies [J].
Faulon, JL ;
Visco, DP ;
Pophale, RS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (03) :707-720
[10]   STOCHASTIC GENERATOR OF CHEMICAL-STRUCTURE .1. APPLICATION TO THE STRUCTURE ELUCIDATION OF LARGE MOLECULES [J].
FAULON, JL .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (05) :1204-1218