GraphGPSM: a global scoring model for protein structure using graph neural networks

被引:7
作者
He, Guangxing [1 ]
Liu, Jun [1 ]
Liu, Dong [1 ]
Zhang, Guijun [1 ]
机构
[1] Zhejiang Univ Technol, Coll Informat Engn, Hangzhou 310023, Peoples R China
基金
国家重点研发计划;
关键词
protein structures; scoring model; graph neural network; protein modeling; STRUCTURE PREDICTION; QUALITY; SEQUENCE; SERVER; SPACE;
D O I
10.1093/bib/bbad219
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.
引用
收藏
页数:11
相关论文
共 41 条
  • [31] ProQ3D: improved model quality assessments using deep learning
    Uziela, Karolis
    Hurtado, David Menendez
    Shu, Nanjiang
    Wallner, Bjorn
    Elofsson, Arne
    [J]. BIOINFORMATICS, 2017, 33 (10) : 1578 - 1580
  • [32] ProQ3: Improved model quality assessments using Rosetta energy terms
    Uziela, Karolis
    Shu, Nanjiang
    Wallner, Bjorn
    Elofsson, Arne
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [33] AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models
    Varadi, Mihaly
    Anyango, Stephen
    Deshpande, Mandar
    Nair, Sreenath
    Natassia, Cindy
    Yordanova, Galabina
    Yuan, David
    Stroe, Oana
    Wood, Gemma
    Laydon, Agata
    Zidek, Augustin
    Green, Tim
    Tunyasuvunakool, Kathryn
    Petersen, Stig
    Jumper, John
    Clancy, Ellen
    Green, Richard
    Vora, Ankur
    Lutfi, Mira
    Figurnov, Michael
    Cowie, Andrew
    Hobbs, Nicole
    Kohli, Pushmeet
    Kleywegt, Gerard
    Birney, Ewan
    Hassabis, Demis
    Velankar, Sameer
    [J]. NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) : D439 - D444
  • [34] PISCES: a protein sequence culling server
    Wang, GL
    Dunbrack, RL
    [J]. BIOINFORMATICS, 2003, 19 (12) : 1589 - 1591
  • [35] Webb B., 2016, CURRENT PROTOCOLOS B, V54
  • [36] Prediction of protein function from protein sequence and structure
    Whisstock, JC
    Lesk, AM
    [J]. QUARTERLY REVIEWS OF BIOPHYSICS, 2003, 36 (03) : 307 - 340
  • [37] Wu R., 2022, BIORXIV, DOI DOI 10.1101/2022.07.21.500999
  • [38] Improved protein structure prediction using predicted interresidue orientations
    Yang, Jianyi
    Anishchenko, Ivan
    Park, Hahnbeom
    Peng, Zhenling
    Ovchinnikov, Sergey
    Baker, David
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (03) : 1496 - 1503
  • [39] Enhancing Protein Conformational Space Sampling Using Distance Profile-Guided Differential Evolution
    Zhang, Gui-Jun
    Zhou, Xiao-Gen
    Yu, Xu-Feng
    Hao, Xiao-Hu
    Yu, Li
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (06) : 1288 - 1301
  • [40] I-TASSER: Fully automated protein structure prediction in CASP8
    Zhang, Yang
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 77 : 100 - 113