Structural Outlier Detection and Zernike-Canterakis Moments for Molecular Surface Meshes-Fast Implementation in Python']Python

被引:1
作者
Banach, Mateusz [1 ]
机构
[1] Jagiellonian Univ, Fac Med, Dept Bioinformat & Telemed, Med Coll, Medyczna 7, PL-30688 Krakow, Poland
来源
MOLECULES | 2024年 / 29卷 / 01期
关键词
bioinformatics; computational geometry; molecular surface; Numba; principal component analysis; protein structure; !text type='Python']Python[!/text; shape retrieval; Zernike moments; PROTEIN-STRUCTURE ALIGNMENT; RELATE; 2; SETS; CRYSTAL-STRUCTURE; CLASSIFICATION; EFFICIENT; ROTATION; COMPLEX; BIOLOGY; BINDING; TOOLS;
D O I
10.3390/molecules29010052
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Object retrieval systems measure the degree of similarity of the shape of 3D models. They search for the elements of the 3D model databases that resemble the query model. In structural bioinformatics, the query model is a protein tertiary/quaternary structure and the objective is to find similarly shaped molecules in the Protein Data Bank. With the ever-growing size of the PDB, a direct atomic coordinate comparison with all its members is impractical. To overcome this problem, the shape of the molecules can be encoded by fixed-length feature vectors. The distance of a protein to the entire PDB can be measured in this low-dimensional domain in linear time. The state-of-the-art approaches utilize Zernike-Canterakis moments for the shape encoding and supply the retrieval process with geometric data of the input structures. The BioZernike descriptors are a standard utility of the PDB since 2020. However, when trying to calculate the ZC moments locally, the issue of the deficiency of libraries readily available for use in custom programs (i.e., without relying on external binaries) is encountered, in particular programs written in Python. Here, a fast and well-documented Python implementation of the Pozo-Koehl algorithm is presented. In contrast to the more popular algorithm by Novotni and Klein, which is based on the voxelized volume, the PK algorithm produces ZC moments directly from the triangular surface meshes of 3D models. In particular, it can accept the molecular surfaces of proteins as its input. In the presented PK-Zernike library, owing to Numba's just-in-time compilation, a mesh with 50,000 facets is processed by a single thread in a second at the moment order 20. Since this is the first time the PK algorithm is used in structural bioinformatics, it is employed in a novel, simple, but efficient protein structure retrieval pipeline. The elimination of the outlying chain fragments via a fast PCA-based subroutine improves the discrimination ability, allowing for this pipeline to achieve an 0.961 area under the ROC curve in the BioZernike validation suite (0.997 for the assemblies). The correlation between the results of the proposed approach and of the 3D Surfer program attains values up to 0.99.
引用
收藏
页数:32
相关论文
共 104 条
  • [1] 3D Surfer, About us
  • [2] Real-time structure search and structure classification for AlphaFold protein models
    Aderinwale, Tunde
    Bharadwaj, Vijay
    Christoffer, Charles
    Terashi, Genki
    Zhang, Zicong
    Jahandideh, Rashidedin
    Kagaya, Yuki
    Kihara, Daisuke
    [J]. COMMUNICATIONS BIOLOGY, 2022, 5 (01)
  • [3] [Anonymous], 2023, The PyMOL molecular graphics system
  • [4] Structural Basis of Differential Neutralization of DENV-1 Genotypes by an Antibody that Recognizes a Cryptic Epitope
    Austin, S. Kyle
    Dowd, Kimberly A.
    Shrestha, Bimmi
    Nelson, Christopher A.
    Edeling, Melissa A.
    Johnson, Syd
    Pierson, Theodore C.
    Diamond, Michael S.
    Fremont, Daved H.
    [J]. PLOS PATHOGENS, 2012, 8 (10)
  • [5] ProDy: Protein Dynamics Inferred from Theory and Experiments
    Bakan, Ahmet
    Meireles, Lidio M.
    Bahar, Ivet
    [J]. BIOINFORMATICS, 2011, 27 (11) : 1575 - 1577
  • [6] Improved Assessment of Globularity of Protein Structures and the Ellipsoid Profile of the Biological Assemblies from the PDB
    Banach, Mateusz
    [J]. BIOMOLECULES, 2023, 13 (02)
  • [7] Symmetrization in the Calculation Pipeline of Gauss Function-Based Modeling of Hydrophobicity in Protein Structures
    Banach, Mateusz
    [J]. SYMMETRY-BASEL, 2022, 14 (09):
  • [8] Assessment of Globularity of Protein Structures via Minimum Volume Ellipsoids and Voxel-Based Atom Representation
    Banach, Mateusz
    [J]. CRYSTALS, 2021, 11 (12)
  • [9] Contribution to the Understanding of Protein-Protein Interface and Ligand Binding Site Based on Hydrophobicity Distribution-Application to Ferredoxin I and II Cases
    Banach, Mateusz
    Chomilier, Jacques
    Roterman, Irena
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [10] Contribution to the Prediction of the Fold Code: Application to Immunoglobulin and Flavodoxin Cases
    Banach, Mateusz
    Prudhomme, Nicolas
    Carpentier, Mathilde
    Duprat, Elodie
    Papandreou, Nikolaos
    Kalinowska, Barbara
    Chomilier, Jacques
    Roterman, Irena
    [J]. PLOS ONE, 2015, 10 (04):