A Parallel Implementation for Large-Scale TSR-based 3D Structural Comparisons of Protein and Amino Acid

被引:0
作者
Chen, Feng [1 ]
Milon, Tarikul I. [2 ]
Khajouie, Poorya [2 ,3 ]
Myers, Antoinette [2 ]
Xu, Wu [2 ]
机构
[1] Louisiana State Univ, Frey Comp Serv Ctr, High Performance Comp, Baton Rouge, LA 70803 USA
[2] Univ Louisiana, Dept Chem, POB 44370, Lafayette, LA 70504 USA
[3] Univ Louisiana, Ctr Adv Comp Studies, Lafayette, LA 70504 USA
关键词
3D structure comparison; TSR-based method; Amino acid structure; Hybrid programming; MPI; OpenMP; OpenACC; protein; BLAST; COMBINATORIAL EXTENSION CE; STRUCTURE ALIGNMENT; SEQUENCE ALIGNMENT; CLASSIFICATION; SIMILARITIES; ALGORITHM; DATABASE; TOOL; STANDARD; MODEL;
D O I
10.2174/0115748936306625240724102438
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Proteins play a vital role in sustaining life, requiring the formation of specific 3D structures to manifest their essential biological functions. Structure comparison techniques are benefiting from the ever-expanding repositories of the Protein Data Bank. The development of computational tools for protein and amino acid 3D structural comparisons plays an important role in understanding protein functions. The Triangular Spatial Relationship (TSR)-based was developed for such purpose.Methods A parallelization strategy and actual implementation on high-performance clusters using the distributed and shared memory programming model, along with the utilization of multi-core CPU and many-core GPU accelerators, were developed. 3D structures of proteins and amino acids are represented by an integer vector in the TSR-based method. This parallelization strategy is designed for the TSR-based method for large-scale 3D structural comparisons of proteins and amino acids in this study. It can also be adapted to other applications where a vector type of data structure is used.Results Due to the nature of the vector representation of protein and amino acid structures using the TSR-based method, the comparison algorithm is well-suited for parallelization on large scale supercomputers. Performance studies on the representative datasets were conducted to demonstrate the efficiency of the parallelization strategy. It allows comparisons of large 3D protein or amino acid structure datasets to finish within a reasonable amount of time.Conclusion The case studies, by taking advantage of this parallelization code, demonstrate that applying either mirror image or feature selection in the TSR-based algorithms improves the classifications of protein and amino acid 3D structures. The TSR keys have the advantage of performing structure-based BLAST searches. The parallelization code could be used as a reference for similar future studies.
引用
收藏
页数:16
相关论文
共 74 条
  • [1] Ackerman M, 2016, J MACH LEARN RES, V17
  • [2] Inferring topological features of proteins from amino acid residue networks
    Alves, Nelson Augusto
    Martinez, Alexandre Souto
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2007, 375 (01) : 336 - 344
  • [3] Maximum Contact Map Overlap Revisited
    Andonov, Rumen
    Malod-Dognin, Noel
    Yanev, Nicola
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (01) : 27 - 41
  • [4] [Anonymous], The OpenACC Application Programming Interface
  • [5] A COMPUTER VISION-BASED TECHNIQUE FOR 3-D SEQUENCE-INDEPENDENT STRUCTURAL COMPARISON OF PROTEINS
    BACHAR, O
    FISCHER, D
    NUSSINOV, R
    WOLFSON, H
    [J]. PROTEIN ENGINEERING, 1993, 6 (03): : 279 - 288
  • [6] The effect of backbone on the small-world properties of protein contact maps
    Bartoli, L.
    Fariselli, P.
    Casadio, R.
    [J]. PHYSICAL BIOLOGY, 2007, 4 (04) : L1 - L5
  • [7] How the Protein Data Bank changed biology: An introduction to the JBC Reviews thematic series, part 1
    Berman, Helen M.
    Gierasch, Lila M.
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2021, 296
  • [8] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [9] PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES
    BERNSTEIN, FC
    KOETZLE, TF
    WILLIAMS, GJB
    MEYER, EF
    BRICE, MD
    RODGERS, JR
    KENNARD, O
    SHIMANOUCHI, T
    TASUMI, M
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) : 535 - 542
  • [10] OpenMP: An industry standard API for shared-memory programming
    Dagum, L
    Menon, R
    [J]. IEEE COMPUTATIONAL SCIENCE & ENGINEERING, 1998, 5 (01): : 46 - 55