Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses

被引:6
作者
Koca, Mehmet Burak [1 ]
Nourani, Esmaeil [2 ]
Abbasoglu, Ferda [1 ]
Karadeniz, Ilknur [3 ]
Sevilgen, Fatih Erdogan [1 ,4 ]
机构
[1] Gebze Tech Univ, Fac Engn, Dept Comp Engn, Kocaeli, Turkey
[2] Azarbaijan Shahid Madani Univ, Fac Comp Engn & Informat Technol, Dept Informat Technol, Tabriz, Iran
[3] Isik Univ, Fac Engn & Nat Sci, Dept Comp Engn, Istanbul, Turkey
[4] Bogazici Univ, Inst Data Sci & Artificial Intelligence, Istanbul, Turkey
关键词
PHI networks; Graph convolutional networks; Protein-protein interaction prediction; HOST; GENE; WEB;
D O I
10.1016/j.compbiolchem.2022.107755
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Computational identification of human-virus protein-protein interactions (PHIs) is a worthwhile step towards understanding infection mechanisms. Analysis of the PHI networks is important for the determination of pathogenic diseases. Prediction of these interactions is a popular problem since experimental detection of PHIs is both time-consuming and expensive. The available methods use biological features like amino acid sequences, molecular structure, or biological activities for prediction. Recent studies show that the topological properties of proteins in protein-protein interaction (PPI) networks increase the performance of the predictions. The basic network projections, random-walk-based models, or graph neural networks are used for generating topologically enriched (hybrid) protein embeddings. In this study, we propose a three-stage machine learning pipeline that generates and uses hybrid embeddings for PHI prediction. In the first stage, numerical features are extracted from the amino acid sequences using the Doc2Vec and Byte Pair Encoding method. The amino acid embeddings are used as node features while training a modified GraphSAGE model, which is an improved version of the graph convolutional network. Lastly, the hybrid protein embeddings are used for training a binary interaction classifier model that predicts whether there is an interaction between the given two proteins or not. The proposed method is evaluated with comprehensive experiments to test its functionality and compare it with the state-of-art methods. The experimental results on the benchmark dataset prove the efficiency of the proposed model by having a 3-23% better area under curve (AUC) score than its competitors.
引用
收藏
页数:14
相关论文
共 43 条
[1]   Predicting Interactions between Virus and Host Proteins Using Repeat Patterns and Composition of Amino Acids [J].
Alguwaizani, Saud ;
Park, Byungkyu ;
Zhou, Xiang ;
Huang, De-Shuang ;
Han, Kyungsook .
JOURNAL OF HEALTHCARE ENGINEERING, 2018, 2018
[2]   HPIDB 2.0: a curated database for host-pathogen interactions [J].
Ammari, Mais G. ;
Gresham, Cathy R. ;
McCarthy, Fiona M. ;
Nanduri, Bindu .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
[3]  
[Anonymous], 2022, COVID live - Coronavirus statistics
[4]   Prediction of Interactions between Viral and Host Proteins Using Supervised Machine Learning Methods [J].
Barman, Ranjan Kumar ;
Saha, Sudipto ;
Das, Santasabuj .
PLOS ONE, 2014, 9 (11)
[5]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[6]   Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis [J].
Blohm, Philipp ;
Frishman, Goar ;
Smialowski, Pawel ;
Goebels, Florian ;
Wachinger, Benedikt ;
Ruepp, Andreas ;
Frishman, Dmitrij .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D396-D400
[7]   VirusMentha: a new resource for virus-host protein interactions [J].
Calderone, Alberto ;
Licata, Luana ;
Cesareni, Gianni .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D588-D592
[8]   Predicting candidate genes from phenotypes, functions and anatomical site of expression [J].
Chen, Jun ;
Althagafi, Azza ;
Hoehndorf, Robert .
BIOINFORMATICS, 2021, 37 (06) :853-860
[9]   Machine learning techniques for sequence-based prediction of viral-host interactions between SARS-CoV-2 and human proteins [J].
Dey, Lopamudra ;
Chakraborty, Sanjay ;
Mukhopadhyay, Anirban .
BIOMEDICAL JOURNAL, 2020, 43 (05) :438-450
[10]  
ehurek R. R., 2010, WORKSH NEW CHALL NLP, P45