Nonparametric density estimation and optimal bandwidth selection for protein unfolding and unbinding data
被引:21
作者:
Bura, E.
论文数: 0引用数: 0
h-index: 0
机构:
George Washington Univ, Dept Stat, Washington, DC 20052 USA
Vertex Pharmaceut, Biometrics, Cambridge, MA 02139 USAUniv Massachusetts, Dept Chem, Lowell, MA 01854 USA
Bura, E.
[2
,3
]
Zhmurov, A.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Massachusetts, Dept Chem, Lowell, MA 01854 USAUniv Massachusetts, Dept Chem, Lowell, MA 01854 USA
Zhmurov, A.
[1
]
Barsegov, V.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Massachusetts, Dept Chem, Lowell, MA 01854 USAUniv Massachusetts, Dept Chem, Lowell, MA 01854 USA
Barsegov, V.
[1
]
机构:
[1] Univ Massachusetts, Dept Chem, Lowell, MA 01854 USA
[2] George Washington Univ, Dept Stat, Washington, DC 20052 USA
[3] Vertex Pharmaceut, Biometrics, Cambridge, MA 02139 USA
Dynamic force spectroscopy and steered molecular simulations have become powerful tools for analyzing the mechanical properties of proteins, and the strength of protein-protein complexes and aggregates. Probability density functions of the unfolding forces and unfolding times for proteins, and rupture forces and bond lifetimes for protein-protein complexes allow quantification of the forced unfolding and unbinding transitions, and mapping the biomolecular free energy landscape. The inference of the unknown probability distribution functions from the experimental and simulated forced unfolding and unbinding data, as well as the assessment of analytically tractable models of the protein unfolding and unbinding requires the use of a bandwidth. The choice of this quantity is typically subjective as it draws heavily on the investigator's intuition and past experience. We describe several approaches for selecting the "optimal bandwidth" for nonparametric density estimators, such as the traditionally used histogram and the more advanced kernel density estimators. The performance of these methods is tested on unimodal and multimodal skewed, long-tailed distributed data, as typically observed in force spectroscopy experiments and in molecular pulling simulations. The results of these studies can serve as a guideline for selecting the optimal bandwidth to resolve the underlying distributions from the forced unfolding and unbinding data for proteins.