Public Data Set of Protein-Ligand Dissociation Kinetic Constants for Quantitative Structure-Kinetics Relationship Studies

被引:9
|
作者
Liu, Huisi [1 ,2 ]
Su, Minyi [2 ,3 ]
Lin, Hai-Xia [1 ]
Wang, Renxiao [4 ]
Li, Yan [4 ]
机构
[1] Shanghai Univ, Coll Sci, Dept Chem, Shanghai 200444, Peoples R China
[2] Chinese Acad Sci, Shanghai Inst Organ Chem, State Key Lab Bioorgan & Nat Prod Chem, Shanghai 200032, Peoples R China
[3] Parc Cient Barcelona, Carrer Baldiri Reixac 4-8,Torre R,04A05, Barcelona 08028, Spain
[4] Fudan Univ, Sch Pharm, Dept Med Chem, Shanghai 201203, Peoples R China
来源
ACS OMEGA | 2022年 / 7卷 / 22期
基金
中国国家自然科学基金;
关键词
DRUG DISCOVERY; ACCURATE DOCKING; BINDING-KINETICS; PREDICTION; CHEMBL; GLIDE; MODEL; LEAD; KDBI;
D O I
10.1021/acsomega.2c02156
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Protein-ligand binding affinity reflects the equilibrium thermodynamics of the protein-ligand binding process. Binding/unbinding kinetics is the other side of the coin. Computational models for interpreting the quantitative structure-kinetics relationship (QSKR) aim at predicting protein-ligand binding/unbinding kinetics based on protein structure, ligand structure, or their complex structure, which in principle can provide a more rational basis for structure-based drug design. Thus far, most of the public data sets used for deriving such QSKR models are rather limited in sample size and structural diversity. To tackle this problem, we have compiled a set of 680 protein-ligand complexes with experimental dissociation rate constants (k(off)), which were mainly curated from the references accumulated for updating our PDBbind database. Three-dimensional structure of each protein-ligand complex in this data set was either retrieved from the Protein Data Bank or carefully modeled based on a proper template. The entire data set covers 155 types of protein, with their dissociation kinetic constants (k(off)) spanning nearly 10 orders of magnitude. To the best of our knowledge, this data set is the largest of its kind reported publicly. Utilizing this data set, we derived a random forest (RF) model based on protein-ligand atom pair descriptors for predicting k(off) values. We also demonstrated that utilizing modeled structures as additional training samples will benefit the model performance. The RF model with mixed structures can serve as a baseline for testifying other more sophisticated QSKR models. The whole data set, namely, PDBbind-koff-2020, is available for free download at our PDBbind-CN web site (http://www.pdbbind.org.cn/download.php).
引用
收藏
页码:18985 / 18996
页数:12
相关论文
共 6 条