共 35 条
Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein-Ligand Interactions
被引:74
作者:
Li, Yang
[1
,2
]
Yang, Jianyi
[2
]
机构:
[1] Nankai Univ, Coll Life Sci, Tianjin 300071, Peoples R China
[2] Nankai Univ, Sch Math Sci, Tianjin 300071, Peoples R China
基金:
中国国家自然科学基金;
关键词:
OUT CROSS-VALIDATION;
BINDING-AFFINITY;
RANDOM FOREST;
PREDICTION;
LEAD;
OPTIMIZATION;
APPROPRIATE;
ACCURACY;
DOCKING;
SET;
D O I:
10.1021/acs.jcim.7b00049
中图分类号:
R914 [药物化学];
学科分类号:
100701 ;
摘要:
The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set Of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBbind 2007 database, which is significantly higher than the performance of any conventional scoring function on the same benchmark. A few studies have been made to discuss the performance of machine-learning-based, methods, but the reason for this improvement remains unclear. In this study, by systemically controlling the structural and sequence similarity between the training and test proteins of the PDBbind benchmark, we demonstrate that protein structural and sequence Similarity makes a significant impact on machine-learning-based methods. After removal of training proteins that are highly similar to the test proteins identified by structure alignment and sequence alignment, machine-learning-based methods trained on the new training sets do not outperform the conventional scoring functions any more. On the contrary, the performance of conventional functions like X-Score is relatively stable no matter what training data are used to fit the weights of its energy terms.
引用
收藏
页码:1007 / 1012
页数:6
相关论文