GTPLM-GO: Enhancing Protein Function Prediction Through Dual-Branch Graph Transformer and Protein Language Model Fusing Sequence and Local-Global PPI Information

被引:0
作者
Zhang, Haotian [1 ]
Sun, Yundong [1 ,2 ]
Wang, Yansong [1 ]
Luo, Xiaoling [3 ]
Liu, Yumeng [4 ]
Chen, Bin [1 ]
Jin, Xiaopeng [4 ]
Zhu, Dongjie [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Weihai 264209, Peoples R China
[2] Harbin Inst Technol, Dept Elect Sci & Technol, Harbin 150001, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[4] Shenzhen Technol Univ, Coll Big Data & Internet, Shenzhen 518118, Peoples R China
基金
中国国家自然科学基金;
关键词
protein function prediction; PPI networks; dual-branch graph transformer; graph neural networks; protein language model; LARGE-SCALE; ONTOLOGY; TOOL;
D O I
10.3390/ijms26094088
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Currently, protein-protein interaction (PPI) networks have become an essential data source for protein function prediction. However, methods utilizing graph neural networks (GNNs) face significant challenges in modeling PPI networks. A primary issue is over-smoothing, which occurs when multiple GNN layers are stacked to capture global information. This architectural limitation inherently impairs the integration of local and global information within PPI networks, thereby limiting the accuracy of protein function prediction. To effectively utilize information within PPI networks, we propose GTPLM-GO, a protein function prediction method based on a dual-branch Graph Transformer and protein language model. The dual-branch Graph Transformer achieves the collaborative modeling of local and global information in PPI networks through two branches: a graph neural network and a linear attention-based Transformer encoder. GTPLM-GO integrates local-global PPI information with the functional semantic encoding constructed by the protein language model, overcoming the issue of inadequate information extraction in existing methods. Experimental results demonstrate that GTPLM-GO outperforms advanced network-based and sequence-based methods on PPI network datasets of varying scales.
引用
收藏
页数:18
相关论文
共 60 条
[1]   Neuro-symbolic representation learning on biological knowledge graphs [J].
Alshahrani, Mona ;
Khan, Mohammad Asif ;
Maddouri, Omar ;
Kinjo, Akira R. ;
Queralt-Rosinach, Nuria ;
Hoehndorf, Robert .
BIOINFORMATICS, 2017, 33 (17) :2723-2730
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   UniProt: the Universal Protein Knowledgebase in 2023 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Cukura, Austra ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Garmiri, Penelope ;
Gonzales, Leonardo Jose da Costa ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Kandasaamy, Swaathi ;
Lock, Antonia ;
Luciani, Aurelien ;
Lugaric, Marija ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Raposo, Pedro ;
Rice, Daniel L. ;
Saidi, Rabie ;
Santos, Rafael ;
Speretta, Elena ;
Stephenson, James ;
Totoo, Prabhat ;
Turner, Edward ;
Tyagi, Nidhi ;
Vasudev, Preethi ;
Warner, Kate ;
Watkins, Xavier ;
Zellner, Hermann .
NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) :D523-D531
[6]  
Boutet Emmanuel, 2007, V406, P89
[7]   TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding [J].
Cao, Yue ;
Shen, Yang .
BIOINFORMATICS, 2021, 37 (18) :2825-2833
[8]  
Chen DL, 2020, AAAI CONF ARTIF INTE, V34, P3438
[9]   Blast2GO:: a universal tool for annotation, visualization and analysis in functional genomics research [J].
Conesa, A ;
Götz, S ;
García-Gómez, JM ;
Terol, J ;
Talón, M ;
Robles, M .
BIOINFORMATICS, 2005, 21 (18) :3674-3676
[10]   A global genetic interaction network maps a wiring diagram of cellular function [J].
Costanzo, Michael ;
VanderSluis, Benjamin ;
Koch, Elizabeth N. ;
Baryshnikova, Anastasia ;
Pons, Carles ;
Tan, Guihong ;
Wang, Wen ;
Usaj, Matej ;
Hanchard, Julia ;
Lee, Susan D. ;
Pelechano, Vicent ;
Styles, Erin B. ;
Billmann, Maximilian ;
van Leeuwen, Jolanda ;
van Dyk, Nydia ;
Lin, Zhen-Yuan ;
Kuzmin, Elena ;
Nelson, Justin ;
Piotrowski, Jeff S. ;
Srikumar, Tharan ;
Bahr, Sondra ;
Chen, Yiqun ;
Deshpande, Raamesh ;
Kurat, Christoph F. ;
Li, Sheena C. ;
Li, Zhijian ;
Usaj, Mojca Mattiazzi ;
Okada, Hiroki ;
Pascoe, Natasha ;
San Luis, Bryan-Joseph ;
Sharifpoor, Sara ;
Shuteriqi, Emira ;
Simpkins, Scott W. ;
Snider, Jamie ;
Suresh, Harsha Garadi ;
Tan, Yizhao ;
Zhu, Hongwei ;
Malod-Dognin, Noel ;
Janjic, Vuk ;
Przulj, Natasa ;
Troyanskaya, Olga G. ;
Stagljar, Igor ;
Xia, Tian ;
Ohya, Yoshikazu ;
Gingras, Anne-Claude ;
Raught, Brian ;
Boutros, Michael ;
Steinmetz, Lars M. ;
Moore, Claire L. ;
Rosebrock, Adam P. .
SCIENCE, 2016, 353 (6306)