Frequent substructure-based approaches for classifying chemical compounds

被引:220
作者
Deshpande, M [1 ]
Kuramochi, M [1 ]
Wale, N [1 ]
Karypis, G [1 ]
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
classification; chemical compounds; virtual screening; graphs; SVM;
D O I
10.1109/TKDE.2005.127
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Computational techniques that build models to correctly assign chemical compounds to various classes of interest have many applications in pharmaceutical research and are used extensively at various phases during the drug development process. These techniques are used to solve a number of classification problems such as predicting whether or not a chemical compound has the desired biological activity, is toxic or nontoxic, and filtering out drug-like compounds from large compound libraries. This paper presents a substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the data set. The advantage of this approach is that during classification model construction, all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Experimental evaluation on eight different classification problems shows that our approach is computationally scalable and, on average, outperforms existing schemes by 7 percent to 35 percent.
引用
收藏
页码:1036 / 1050
页数:15
相关论文
共 82 条
[11]  
BERTHOLD MR, 2002, P INT C DAT MIN
[12]  
BLEICHER KH, 2003, NATURE REV DRUG DISC
[13]  
BRAVI G, 2000, VIRTUAL SCREENING BI
[14]   The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :1-9
[15]   Comparison of support vector machine and artificial neural network systems for drug/nondrug classification [J].
Byvatov, E ;
Fechner, U ;
Sadowski, J ;
Schneider, G .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (06) :1882-1889
[16]  
CARHART RE, 1985, J CHEM INFORMATION C
[17]  
CHEN X, 1998, J CHEM INFORMATION C
[18]   Graph-based data mining [J].
Cook, DJ ;
Holder, LB .
IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 2000, 15 (02) :32-+
[19]  
DAVIES EK, 1996, MOL DIVERSITY COMBIN
[20]  
Dehaspe L., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P30