共 24 条
Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
被引:75
作者:
Li, Hongjian
[1
]
Leung, Kwong-Sak
[1
]
Wong, Man-Hon
[1
]
Ballester, Pedro J.
[2
,3
,4
,5
]
机构:
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Sha Tin 999077, Hong Kong, Peoples R China
[2] INSERM, Canc Res Ctr Marseille, U1068, F-13009 Marseille, France
[3] Inst Paoli Calmettes, F-13009 Marseille, France
[4] Aix Marseille Univ, F-13284 Marseille, France
[5] CNRS, UMR7258, F-13009 Marseille, France
来源:
关键词:
docking;
binding affinity prediction;
machine-learning scoring functions;
PROTEIN-LIGAND COMPLEXES;
SCORING FUNCTIONS;
VALIDATION;
ACCURACY;
D O I:
10.3390/molecules200610947
中图分类号:
Q5 [生物化学];
Q7 [分子生物学];
学科分类号:
071010 ;
081704 ;
摘要:
Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.
引用
收藏
页码:10947 / 10962
页数:16
相关论文