Multi-Assay-Based Structure-Activity Relationship Models: Improving Structure-Activity Relationship Models by Incorporating Activity Information from Related Targets

被引：32

作者：

Ning, Xia ^{[1
]}

Rangwala, Huzefa ^{[2
]}

Karypis, George ^{[1
]}

机构：

[1] Univ Minnesota, Dept Comp Sci & Comp Engn, Minneapolis, MN 55455 USA

[2] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA

来源：

JOURNAL OF CHEMICAL INFORMATION AND MODELING | 2009年 / 49卷 / 11期

基金：

美国国家科学基金会; 美国国家卫生研究院;

关键词：

SUPPORT VECTOR MACHINES; DRUG DISCOVERY; BIOLOGICAL-ACTIVITY; HOMOLOGY DETECTION; PROTEIN; FINGERPRINT; PREDICTION; RECEPTORS; CONSTANTS; COMPOUND;

D O I：

10.1021/ci900182q

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

Structure-activity relationship (SAR) models are used to inform and to guide the iterative optimization of chemical leads, and they play a fundamental role in modem drug discovery. In this paper, we present a new class of methods for building SAR models, referred to as multi-assay based, that utilize activity information from different targets. These methods first identify a set of targets that are related to the target under consideration, and then they employ various machine learning techniques that utilize activity information from these targets in order to build the desired SAR model. We developed different methods for identifying the set of related targets, which take into account the primary sequence of the targets or the structure of their ligands, and we also developed different machine learning techniques that were derived by using principles of semi-supervised learning, multi-task learning, and classifier ensembles. The comprehensive evaluation of these methods shows that they lead to considerable improvements over the standard SAR models that are based only on the ligands of the target under consideration. On a set of 117 protein targets, obtained from PubChem, these multi-assay-based methods achieve a receiver-operating characteristic score that is, on the average, 7.0-7.2% higher than that achieved by the standard SAR models. Moreover, on a set of targets belonging to six protein families, the multi-assay-based methods outperform chemogenomics-based approaches by 4.33%.

引用

页码：2444 / 2456

页数：13

共 53 条

[1] Recent advances in chemoinformatics [J].

Agrafiotis, Dimitris K. ;

Bandyopadhyay, Deepak ;

Wegner, Jorg K. ;

van Vlijmen, Herman .

JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (04) :1279-1293

[2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].

Altschul, SF ;

Madden, TL ;

Schaffer, AA ;

Zhang, JH ;

Zhang, Z ;

Miller, W ;

Lipman, DJ .

NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402

[3]

[Anonymous], P INT C MACH LEARN

[4]

[Anonymous], 2006, IEEE T NEURAL NETWOR

[5]

[Anonymous], P PAC S BIOC

[6]

[Anonymous], 2003, HP INVENT

[7]

[Anonymous], P INT C ART INT STAT

[8] NIH Molecular Libraries Initiative [J].

Austin, CP ;

Brady, LS ;

Insel, TR ;

Collins, FS .

SCIENCE, 2004, 306 (5699) :1138-1139

[9] Virtual screen for ligands of orphan G protein-coupled receptors [J].

Bock, JR ;

Gough, DA .

JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (05) :1402-1414

[10]

BRAVI G, 2000, VIRTUAL SCREENING BI, V10

← 1 2 3 4 5 6 →