Prediction of HIV-1 Protease Inhibitors Using Machine Learning Approaches

被引:7
作者
Rao, Hanbing [1 ]
Yang, Guobing [2 ]
Tan, Ningxin [2 ]
Li, Ping [2 ]
Li, Zerong [1 ]
Li, Xiangyuan [2 ]
机构
[1] Sichuan Univ, Coll Chem, Chengdu 610064, Peoples R China
[2] Sichuan Univ, Coll Chem Engn, Chengdu 610064, Peoples R China
来源
QSAR & COMBINATORIAL SCIENCE | 2009年 / 28卷 / 11-12期
基金
中国国家自然科学基金;
关键词
Machine learning methods; HIV-1 protease inhibitors; Feature selection; Monte Carlo simulated annealing; Applicability domainm; Medicinal chemistry; Structure-property relationships; SUPPORT VECTOR MACHINES; INDEPENDENT 4D-QSAR ANALYSIS; NEURAL-NETWORKS; APPLICABILITY DOMAINS; SECONDARY STRUCTURE; QSAR; CLASSIFICATION; DESIGN; SELECTION; MODELS;
D O I
10.1002/qsar.200960021
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
In this study, multiple machine learning approaches, including support vector machine (SVM), k-nearest neighbor (k-NN), artificial neural networks (ANN) and logistic regression (I-R), are applied for classification of HIV-1 protease inhibitors(PIs) from molecular structure. A diverse set of 641 compounds, including 414 active agents (PIs+) and 227. inactive agents (PIS-), are adopted to develop the classification models. A hybrid feature selection method, which combines Fischer's score and Monte Carlo simulated annealing embedded in the support vector machine approach, is used to select the relevant descriptors from a pool of 1559 molecular descriptors. Three validation methods are employed to validate the model in this study. The first one is the five-fold cross validation method and the averaged prediction accuracies for these machine learning approaches are between 83.9-93.5% for PIs+ and between 67.0-77.7% for PIs- agents. The second validation method is the external test set and the prediction accuracies for PIs+ are between 84.6 - 95.2% and for PIs -agents are between 63.2 - 87.7%. These two validation methods show that the SVM model has better overall performance than other three machine learning models. The third validation method is the y-scrambling method, which shows no obvious chance correction in the developed SVM model. The prediction method proposed in this work can give better generalization ability than other recently published methods and can be used as an alternative fast filter ill the virtual screening of large chemical database.
引用
收藏
页码:1346 / 1357
页数:12
相关论文
共 74 条
[1]   Three-dimensional QSAR using the k-nearest neighbor method and its interpretation [J].
Ajmani, S ;
Jadhav, K ;
Kulkarni, SA .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) :24-31
[2]   A multivariate analysis of HIV-1 protease inhibitors and resistance induced by mutation [J].
Almerico, AM ;
Tutone, M ;
Lauria, A ;
Diana, P ;
Barraja, P ;
Montalbano, A ;
Cirrincione, G ;
Dattolo, G .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) :168-179
[3]  
[Anonymous], 2004, Applied logistic regression
[4]  
[Anonymous], NONLINEAR PROGRAMMIN
[5]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[6]   OPTIMIZATION IN IRREGULARLY SHAPED REGIONS - PH AND SOLVENT STRENGTH IN REVERSED-PHASE HIGH-PERFORMANCE LIQUID-CHROMATOGRAPHY SEPARATIONS [J].
BOURGUIGNON, B ;
DEAGUIAR, PF ;
KHOTS, MS ;
MASSART, DL .
ANALYTICAL CHEMISTRY, 1994, 66 (06) :893-904
[7]   APPLICATION OF NONLINEAR-REGRESSION FUNCTIONS FOR THE MODELING OF RETENTION IN REVERSED-PHASE LC [J].
BOURGUIGNON, B ;
DEAGUIAR, PF ;
THORRE, K ;
MASSART, DL .
JOURNAL OF CHROMATOGRAPHIC SCIENCE, 1994, 32 (04) :144-152
[8]   DIGITAL IMAGE CORRELATION USING NEWTON-RAPHSON METHOD OF PARTIAL-DIFFERENTIAL CORRECTION [J].
BRUCK, HA ;
MCNEILL, SR ;
SUTTON, MA ;
PETERS, WH .
EXPERIMENTAL MECHANICS, 1989, 29 (03) :261-267
[9]   Drug design by machine learning: support vector machines for pharmaceutical data analysis [J].
Burbidge, R ;
Trotter, M ;
Buxton, B ;
Holden, S .
COMPUTERS & CHEMISTRY, 2001, 26 (01) :5-14
[10]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167