Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest

被引:75
作者
Li, Hongjian [1 ]
Leung, Kwong-Sak [1 ]
Wong, Man-Hon [1 ]
Ballester, Pedro J. [2 ,3 ,4 ,5 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Sha Tin 999077, Hong Kong, Peoples R China
[2] INSERM, Canc Res Ctr Marseille, U1068, F-13009 Marseille, France
[3] Inst Paoli Calmettes, F-13009 Marseille, France
[4] Aix Marseille Univ, F-13284 Marseille, France
[5] CNRS, UMR7258, F-13009 Marseille, France
关键词
docking; binding affinity prediction; machine-learning scoring functions; PROTEIN-LIGAND COMPLEXES; SCORING FUNCTIONS; VALIDATION; ACCURACY;
D O I
10.3390/molecules200610947
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.
引用
收藏
页码:10947 / 10962
页数:16
相关论文
共 24 条
[1]  
Ballester PJ, 2012, LECT NOTES COMPUT SC, V7632, P14, DOI 10.1007/978-3-642-34123-6_2
[2]   Does a More Precise Chemical Description of Protein-Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? [J].
Ballester, Pedro J. ;
Schreyer, Adrian ;
Blundell, Tom L. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (03) :944-955
[3]   Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification [J].
Ballester, Pedro J. ;
Mangold, Martina ;
Howard, Nigel I. ;
Robinson, Richard L. Marchese ;
Abell, Chris ;
Blumberger, Jochen ;
Mitchell, John B. O. .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2012, 9 (77) :3196-3207
[4]   Comments on "Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets": Significance for the Validation of Scoring Functions [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (08) :1739-1741
[5]   A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
BIOINFORMATICS, 2010, 26 (09) :1169-1175
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Comparative Assessment of Scoring Functions on a Diverse Test Set [J].
Cheng, Tiejun ;
Li, Xun ;
Li, Yan ;
Liu, Zhihai ;
Wang, Renxiao .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (04) :1079-1093
[8]   Characterization of Small Molecule Binding. I. Accurate Identification of Strong Inhibitors in Virtual Screening [J].
Ding, Bo ;
Wang, Jian ;
Li, Nan ;
Wang, Wei .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (01) :114-122
[9]   NNScore 2.0: A Neural-Network Receptor-Ligand Scoring Function [J].
Durrant, Jacob D. ;
McCammon, J. Andrew .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (11) :2897-2903
[10]   Bioinformatics and variability in drug response: a protein structural perspective [J].
Lahti, Jennifer L. ;
Tang, Grace W. ;
Capriotti, Emidio ;
Liu, Tianyun ;
Altman, Russ B. .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2012, 9 (72) :1409-1437