Bayesian Active Learning for Optimization and Uncertainty Quantification in Protein Docking

被引：10

作者：

Cao, Yue ^{[1
]}

Shen, Yang ^{[1
,2
]}

机构：

[1] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA

[2] Texas A&M Univ, TEES AgriLife Ctr Bioinformat & Genom Syst Engn, College Stn, TX 77840 USA

来源：

JOURNAL OF CHEMICAL THEORY AND COMPUTATION | 2020年 / 16卷 / 08期

基金：

美国国家科学基金会; 美国国家卫生研究院;

关键词：

PREDICTION; REFINEMENT;

D O I：

10.1021/acs.jctc.0c00476

中图分类号：

O64 [物理化学（理论化学）、化学物理学];

学科分类号：

070304 ; 081704 ;

摘要：

Ab initio protein docking represents a major challenge for optimizing a noisy and costly "black box"-like function in a high-dimensional space. Despite progress in this field, there is a lack of rigorous uncertainty quantification (UQ). To fill the gap, we introduce a novel algorithm, Bayesian active learning (BAL), for optimization and UQ of such black-box functions with applications to flexible protein docking. BAL directly models the posterior distribution of the global optimum (i.e., native structures) with active sampling and posterior estimation iteratively feeding each other. Furthermore, it uses complex normal modes to span a homogeneous, Euclidean conformation space suitable for high-dimensional optimization and constructs funnel-like energy models for quality estimation of encounter complexes. Over a protein-docking benchmark set and a CAPRI set including homology docking, we establish that BAL significantly improves against starting points from rigid docking and refinements by particle swarm optimization, providing a top-3 near-native prediction for one third targets. Quality assessment empowered with UQ leads to tight quality intervals with half range around 25% of the actual interface root-mean-square deviation and confidence level at 85%. BAL's estimated probability of a prediction being near-native achieves binary classification AUROC at 0.93 and area under the precision recall curve over 0.60 (compared to 0.50 and 0.14, respectively, by chance), which also improves ranking predictions. This study represents the first UQ solution for protein docking, with rigorous theoretical frameworks and comprehensive empirical assessments.

引用

页码：5334 / 5347

页数：14

共 39 条

[1] Agrawal S., 2012, JMLR WORKSHOP C P, P39
[2] CHARMM: The Biomolecular Simulation Program
Brooks, B. R.
Brooks, C. L., III
Mackerell, A. D., Jr.
Nilsson, L.
Petrella, R. J.
Roux, B.
Won, Y.
Archontis, G.
Bartels, C.
Boresch, S.
Caflisch, A.
Caves, L.
Cui, Q.
Dinner, A. R.
Feig, M.
Fischer, S.
Gao, J.
Hodoscek, M.
Im, W.
Kuczera, K.
Lazaridis, T.
Ma, J.
Ovchinnikov, V.
Paci, E.
Pastor, R. W.
Post, C. B.
Pu, J. Z.
Schaefer, M.
Tidor, B.
Venable, R. M.
Woodcock, H. L.
Wu, X.
Yang, W.
York, D. M.
Karplus, M.
[J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2009, 30 (10) : 1545 - 1614
[3] Chapelle Olivier, 2011, P NEURIPS, V24
[4] Predicting protein conformational changes for unbound and homology docking: learning from intrinsic and induced flexibility
Chen, Haoran
Sun, Yuanfei
Shen, Yang
[J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2017, 85 (03) : 544 - 556
[5] Chiles J-P, 2012, GEOSTATISTICS
[6] The particle swarm - Explosion, stability, and convergence in a multidimensional complex space
Clerc, M
Kennedy, J
[J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2002, 6 (01) : 58 - 73
[7] Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations
Gray, JJ
Moughon, S
Wang, C
Schueler-Furman, O
Kuhlman, B
Rohl, CA
Baker, D
[J]. JOURNAL OF MOLECULAR BIOLOGY, 2003, 331 (01) : 281 - 299
[8] Gyorfi L., 2002, A Distribution-Free Theory of Nonparametric Regression, V1
[9] Hernández-Lobato JM, 2014, ADV NEUR IN, V27
[10] Protein-protein docking benchmark version 4.0
Hwang, Howook
Vreven, Thom
Janin, Joel
Weng, Zhiping
[J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2010, 78 (15) : 3111 - 3114

← 1 2 3 4 →