Structural Outlier Detection and Zernike-Canterakis Moments for Molecular Surface Meshes-Fast Implementation in Python']Python

被引:1
作者
Banach, Mateusz [1 ]
机构
[1] Jagiellonian Univ, Fac Med, Dept Bioinformat & Telemed, Med Coll, Medyczna 7, PL-30688 Krakow, Poland
来源
MOLECULES | 2024年 / 29卷 / 01期
关键词
bioinformatics; computational geometry; molecular surface; Numba; principal component analysis; protein structure; !text type='Python']Python[!/text; shape retrieval; Zernike moments; PROTEIN-STRUCTURE ALIGNMENT; RELATE; 2; SETS; CRYSTAL-STRUCTURE; CLASSIFICATION; EFFICIENT; ROTATION; COMPLEX; BIOLOGY; BINDING; TOOLS;
D O I
10.3390/molecules29010052
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Object retrieval systems measure the degree of similarity of the shape of 3D models. They search for the elements of the 3D model databases that resemble the query model. In structural bioinformatics, the query model is a protein tertiary/quaternary structure and the objective is to find similarly shaped molecules in the Protein Data Bank. With the ever-growing size of the PDB, a direct atomic coordinate comparison with all its members is impractical. To overcome this problem, the shape of the molecules can be encoded by fixed-length feature vectors. The distance of a protein to the entire PDB can be measured in this low-dimensional domain in linear time. The state-of-the-art approaches utilize Zernike-Canterakis moments for the shape encoding and supply the retrieval process with geometric data of the input structures. The BioZernike descriptors are a standard utility of the PDB since 2020. However, when trying to calculate the ZC moments locally, the issue of the deficiency of libraries readily available for use in custom programs (i.e., without relying on external binaries) is encountered, in particular programs written in Python. Here, a fast and well-documented Python implementation of the Pozo-Koehl algorithm is presented. In contrast to the more popular algorithm by Novotni and Klein, which is based on the voxelized volume, the PK algorithm produces ZC moments directly from the triangular surface meshes of 3D models. In particular, it can accept the molecular surfaces of proteins as its input. In the presented PK-Zernike library, owing to Numba's just-in-time compilation, a mesh with 50,000 facets is processed by a single thread in a second at the moment order 20. Since this is the first time the PK algorithm is used in structural bioinformatics, it is employed in a novel, simple, but efficient protein structure retrieval pipeline. The elimination of the outlying chain fragments via a fast PCA-based subroutine improves the discrimination ability, allowing for this pipeline to achieve an 0.961 area under the ROC curve in the BioZernike validation suite (0.997 for the assemblies). The correlation between the results of the proposed approach and of the 3D Surfer program attains values up to 0.99.
引用
收藏
页数:32
相关论文
共 104 条
[31]  
github, BioZernike Repository
[32]  
github, Mindboggle Repository
[33]  
github, 3D Surfer Repository
[34]  
github, AlphaFold Repository
[35]  
github, dr-sasa Repository
[36]   Real time structural search of the Protein Data Bank [J].
Guzenko, Dmytro ;
Burley, Stephen K. ;
Duarte, Jose M. .
PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (07)
[37]   A global map of the protein shape universe [J].
Han, Xusi ;
Sit, Atilla ;
Christoffer, Charles ;
Chen, Siyang ;
Kihara, Daisuke .
PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (04)
[38]   Array programming with NumPy [J].
Harris, Charles R. ;
Millman, K. Jarrod ;
van der Walt, Stefan J. ;
Gommers, Ralf ;
Virtanen, Pauli ;
Cournapeau, David ;
Wieser, Eric ;
Taylor, Julian ;
Berg, Sebastian ;
Smith, Nathaniel J. ;
Kern, Robert ;
Picus, Matti ;
Hoyer, Stephan ;
van Kerkwijk, Marten H. ;
Brett, Matthew ;
Haldane, Allan ;
del Rio, Jaime Fernandez ;
Wiebe, Mark ;
Peterson, Pearu ;
Gerard-Marchant, Pierre ;
Sheppard, Kevin ;
Reddy, Tyler ;
Weckesser, Warren ;
Abbasi, Hameer ;
Gohlke, Christoph ;
Oliphant, Travis E. .
NATURE, 2020, 585 (7825) :357-362
[39]   CRYSTAL-STRUCTURE OF PROKARYOTIC RIBOSOMAL-PROTEIN L9 - A BILOBED RNA-BINDING PROTEIN [J].
HOFFMAN, DW ;
DAVIES, C ;
GERCHMAN, SE ;
KYCIA, JH ;
PORTER, SJ ;
WHITE, SW ;
RAMAKRISHNAN, V .
EMBO JOURNAL, 1994, 13 (01) :205-212
[40]   PROTEIN-STRUCTURE COMPARISON BY ALIGNMENT OF DISTANCE MATRICES [J].
HOLM, L ;
SANDER, C .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 233 (01) :123-138