Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest

被引：75

作者：

Li, Hongjian ^{[1
]}

Leung, Kwong-Sak ^{[1
]}

Wong, Man-Hon ^{[1
]}

Ballester, Pedro J. ^{[2
,3
,4
,5
]}

机构：

[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Sha Tin 999077, Hong Kong, Peoples R China

[2] INSERM, Canc Res Ctr Marseille, U1068, F-13009 Marseille, France

[3] Inst Paoli Calmettes, F-13009 Marseille, France

[4] Aix Marseille Univ, F-13284 Marseille, France

[5] CNRS, UMR7258, F-13009 Marseille, France

来源：

MOLECULES | 2015年 / 20卷 / 06期

关键词：

docking; binding affinity prediction; machine-learning scoring functions; PROTEIN-LIGAND COMPLEXES; SCORING FUNCTIONS; VALIDATION; ACCURACY;

D O I：

10.3390/molecules200610947

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.

引用

页码：10947 / 10962

页数：16

共 24 条

[21] CREDO: A Protein-Ligand Interaction Database for Drug Discovery [J].

Schreyer, Adrian ;

Blundell, Tom .

CHEMICAL BIOLOGY & DRUG DESIGN, 2009, 73 (02) :157-167

[22] Valence bond theory for chemical dynamics [J].

Truhlar, Donald G. .

JOURNAL OF COMPUTATIONAL CHEMISTRY, 2007, 28 (01) :73-86

[23] The PDBbind database: Collection of binding affinities for protein-ligand complexes with known three-dimensional structures [J].

Wang, RX ;

Fang, XL ;

Lu, YP ;

Wang, SM .

JOURNAL OF MEDICINAL CHEMISTRY, 2004, 47 (12) :2977-2980

[24] Essential considerations for using protein-ligand structures in drug discovery [J].

Warren, Gregory L. ;

Do, Thanh D. ;

Kelley, Brian P. ;

Nicholls, Anthony ;

Warren, Slephen D. .

DRUG DISCOVERY TODAY, 2012, 17 (23-24) :1270-1281

← 1 2 3 →