Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

被引:186
作者
Cang, Zixuan [1 ]
Mu, Lin [2 ]
Wei, Guo-Wei [1 ,3 ,4 ]
机构
[1] Michigan State Univ, Dept Math, E Lansing, MI 48824 USA
[2] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN USA
[3] Michigan State Univ, Dept Biochem & Mol Biol, E Lansing, MI 48824 USA
[4] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48824 USA
基金
美国国家科学基金会;
关键词
BINDING-AFFINITY PREDICTION; PERSISTENT HOMOLOGY; NEURAL-NETWORKS; DOCKING; ACCURACY; CONSTRUCTION; RECOGNITION; SOFTWARE; DYNAMICS; VOLUME;
D O I
10.1371/journal.pcbi.1005929
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter-and/ or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein- ligand complexes from the PDBBind database and 128,374 ligand-target and decoytarget pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.
引用
收藏
页数:44
相关论文
共 125 条
[1]   Extreme elevation on a 2-manifold [J].
Agarwal, Pankaj K. ;
Edelsbrunner, Herbert ;
Harer, John ;
Wang, Yusu .
DISCRETE & COMPUTATIONAL GEOMETRY, 2006, 36 (04) :553-572
[2]  
[Anonymous], TOPOLOGICAL METHODS
[3]  
[Anonymous], J CHEM PHYS
[4]  
[Anonymous], 2015, Very Deep Convolu- tional Networks for Large-Scale Image Recognition
[5]  
[Anonymous], J COMPUTATIONAL CHEM
[6]  
[Anonymous], P 28 INT C MACH LEAR
[7]  
[Anonymous], AM MATH SOC
[8]  
[Anonymous], 2015, Molecular based Mathematical Biologys
[9]  
[Anonymous], ABS160502688 ARXIV T
[10]  
[Anonymous], 2001, GRADUATE STUDIEMAT