GraphGPSM: a global scoring model for protein structure using graph neural networks

被引：7

作者：

He, Guangxing ^{[1
]}

Liu, Jun ^{[1
]}

Liu, Dong ^{[1
]}

Zhang, Guijun ^{[1
]}

机构：

[1] Zhejiang Univ Technol, Coll Informat Engn, Hangzhou 310023, Peoples R China

来源：

BRIEFINGS IN BIOINFORMATICS | 2023年 / 24卷 / 04期

基金：

国家重点研发计划;

关键词：

protein structures; scoring model; graph neural network; protein modeling; STRUCTURE PREDICTION; QUALITY; SEQUENCE; SERVER; SPACE;

D O I：

10.1093/bib/bbad219

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.

引用

页数：11

共 41 条

[1] The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design
Alford, Rebecca F.
Leaver-Fay, Andrew
Jeliazkov, Jeliazko R.
O'Meara, Matthew J.
DiMaio, Frank P.
Park, Hahnbeom
Shapovalov, Maxim V.
Renfrew, P. Douglas
Mulligan, Vikram K.
Kappel, Kalli
Labonte, Jason W.
Pacella, Michael S.
Bonneau, Richard
Bradley, Philip
Dunbrack, Roland L., Jr.
Das, Rhiju
Baker, David
Kuhlman, Brian
Kortemme, Tanja
Gray, Jeffrey J.
[J]. JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2017, 13 (06) : 3031 - 3048
[2] Accurate prediction of protein structures and interactions using a three-track neural network
Baek, Minkyung
DiMaio, Frank
Anishchenko, Ivan
Dauparas, Justas
Ovchinnikov, Sergey
Lee, Gyu Rie
Wang, Jue
Cong, Qian
Kinch, Lisa N.
Schaeffer, R. Dustin
Millan, Claudia
Park, Hahnbeom
Adams, Carson
Glassman, Caleb R.
DeGiovanni, Andy
Pereira, Jose H.
Rodrigues, Andria V.
van Dijk, Alberdina A.
Ebrecht, Ana C.
Opperman, Diederik J.
Sagmeister, Theo
Buhlheller, Christoph
Pavkov-Keller, Tea
Rathinaswamy, Manoj K.
Dalwadi, Udit
Yip, Calvin K.
Burke, John E.
Garcia, K. Christopher
Grishin, Nick V.
Adams, Paul D.
Read, Randy J.
Baker, David
[J]. SCIENCE, 2021, 373 (6557) : 871 - +
[3] Ultrafast shape recognition: method and applications
Ballester, Pedro J.
[J]. FUTURE MEDICINAL CHEMISTRY, 2011, 3 (01) : 65 - 78
[4] QMEAN: A comprehensive scoring function for model quality assessment
Benkert, Pascal
Tosatto, Silvio C. E.
Schomburg, Dietmar
[J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 71 (01) : 261 - 277
[5] The Protein Data Bank
Berman, HM
Westbrook, J
Feng, Z
Gilliland, G
Bhat, TN
Weissig, H
Shindyalov, IN
Bourne, PE
[J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
[6] Borkar V. S., 2022, Resonance, V27, P1263
[7] Improved prediction of protein-protein interactions using AlphaFold2 (vol 13, 1265, 2022)
Bryant, Patrick
Pozzati, Gabriele
Elofsson, Arne
[J]. NATURE COMMUNICATIONS, 2022, 13 (01)
[8] Single-sequence protein structure prediction using a language model and deep learning
Chowdhury, Ratul
Bouatta, Nazim
Biswas, Surojit
Floristean, Christina
Kharkare, Anant
Roye, Koushik
Rochereau, Charlotte
Ahdritz, Gustaf
Zhang, Joanna
Church, George M.
Sorger, Peter K.
AlQuraishi, Mohammed
[J]. NATURE BIOTECHNOLOGY, 2022, 40 (11) : 1617 - +
[9] Gilmer J., INT C MACH LEARN
[10] DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning
Guo, Sai-Sai
Liu, Jun
Zhou, Xiao-Gen
Zhang, Gui-Jun
[J]. BIOINFORMATICS, 2022, 38 (07) : 1895 - 1903

← 1 2 3 4 5 →