GraphGPSM: a global scoring model for protein structure using graph neural networks

被引:7
作者
He, Guangxing [1 ]
Liu, Jun [1 ]
Liu, Dong [1 ]
Zhang, Guijun [1 ]
机构
[1] Zhejiang Univ Technol, Coll Informat Engn, Hangzhou 310023, Peoples R China
基金
国家重点研发计划;
关键词
protein structures; scoring model; graph neural network; protein modeling; STRUCTURE PREDICTION; QUALITY; SEQUENCE; SERVER; SPACE;
D O I
10.1093/bib/bbad219
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.
引用
收藏
页数:11
相关论文
共 41 条
  • [1] The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design
    Alford, Rebecca F.
    Leaver-Fay, Andrew
    Jeliazkov, Jeliazko R.
    O'Meara, Matthew J.
    DiMaio, Frank P.
    Park, Hahnbeom
    Shapovalov, Maxim V.
    Renfrew, P. Douglas
    Mulligan, Vikram K.
    Kappel, Kalli
    Labonte, Jason W.
    Pacella, Michael S.
    Bonneau, Richard
    Bradley, Philip
    Dunbrack, Roland L., Jr.
    Das, Rhiju
    Baker, David
    Kuhlman, Brian
    Kortemme, Tanja
    Gray, Jeffrey J.
    [J]. JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2017, 13 (06) : 3031 - 3048
  • [2] Accurate prediction of protein structures and interactions using a three-track neural network
    Baek, Minkyung
    DiMaio, Frank
    Anishchenko, Ivan
    Dauparas, Justas
    Ovchinnikov, Sergey
    Lee, Gyu Rie
    Wang, Jue
    Cong, Qian
    Kinch, Lisa N.
    Schaeffer, R. Dustin
    Millan, Claudia
    Park, Hahnbeom
    Adams, Carson
    Glassman, Caleb R.
    DeGiovanni, Andy
    Pereira, Jose H.
    Rodrigues, Andria V.
    van Dijk, Alberdina A.
    Ebrecht, Ana C.
    Opperman, Diederik J.
    Sagmeister, Theo
    Buhlheller, Christoph
    Pavkov-Keller, Tea
    Rathinaswamy, Manoj K.
    Dalwadi, Udit
    Yip, Calvin K.
    Burke, John E.
    Garcia, K. Christopher
    Grishin, Nick V.
    Adams, Paul D.
    Read, Randy J.
    Baker, David
    [J]. SCIENCE, 2021, 373 (6557) : 871 - +
  • [3] Ultrafast shape recognition: method and applications
    Ballester, Pedro J.
    [J]. FUTURE MEDICINAL CHEMISTRY, 2011, 3 (01) : 65 - 78
  • [4] QMEAN: A comprehensive scoring function for model quality assessment
    Benkert, Pascal
    Tosatto, Silvio C. E.
    Schomburg, Dietmar
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 71 (01) : 261 - 277
  • [5] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [6] Borkar V. S., 2022, Resonance, V27, P1263
  • [7] Improved prediction of protein-protein interactions using AlphaFold2 (vol 13, 1265, 2022)
    Bryant, Patrick
    Pozzati, Gabriele
    Elofsson, Arne
    [J]. NATURE COMMUNICATIONS, 2022, 13 (01)
  • [8] Single-sequence protein structure prediction using a language model and deep learning
    Chowdhury, Ratul
    Bouatta, Nazim
    Biswas, Surojit
    Floristean, Christina
    Kharkare, Anant
    Roye, Koushik
    Rochereau, Charlotte
    Ahdritz, Gustaf
    Zhang, Joanna
    Church, George M.
    Sorger, Peter K.
    AlQuraishi, Mohammed
    [J]. NATURE BIOTECHNOLOGY, 2022, 40 (11) : 1617 - +
  • [9] Gilmer J., INT C MACH LEARN
  • [10] DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning
    Guo, Sai-Sai
    Liu, Jun
    Zhou, Xiao-Gen
    Zhang, Gui-Jun
    [J]. BIOINFORMATICS, 2022, 38 (07) : 1895 - 1903