Multi-modal features-based human-herpesvirus protein-protein interaction prediction by using LightGBM

被引:0
作者
Yang, Xiaodi [1 ]
Wuchty, Stefan [2 ,3 ,4 ]
Liang, Zeyin [1 ]
Ji, Li [1 ]
Wang, Bingjie [1 ]
Zhu, Jialin [1 ]
Zhang, Ziding [5 ]
Dong, Yujun [1 ]
机构
[1] Peking Univ First Hosp, Dept Hematol, Beijing 100034, Peoples R China
[2] Univ Miami, Dept Comp Sci, Miami, FL USA
[3] Univ Miami, Inst Data Sci, Miami, FL USA
[4] Univ Miami, Sylvester Comprehens Canc Ctr, Miami, FL USA
[5] China Agr Univ, Coll Biol Sci, Beijing 100193, Peoples R China
基金
中国国家自然科学基金;
关键词
human-herpesvirus interaction; protein-protein interaction; multi-modal; embedding; LightGBM; prediction; VIRUS; NETWORK; DATABASE; GENOMES;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The identification of human-herpesvirus protein-protein interactions (PPIs) is an essential and important entry point to understand the mechanisms of viral infection, especially in malignant tumor patients with common herpesvirus infection. While natural language processing (NLP)-based embedding techniques have emerged as powerful approaches, the application of multi-modal embedding feature fusion to predict human-herpesvirus PPIs is still limited. Here, we established a multi-modal embedding feature fusion-based LightGBM method to predict human-herpesvirus PPIs. In particular, we applied document and graph embedding approaches to represent sequence, network and function modal features of human and herpesviral proteins. Training our LightGBM models through our compiled non-rigorous and rigorous benchmarking datasets, we obtained significantly better performance compared to individual-modal features. Furthermore, our model outperformed traditional feature encodings-based machine learning methods and state-of-the-art deep learning-based methods using various benchmarking datasets. In a transfer learning step, we show that our model that was trained on human-herpesvirus PPI dataset without cytomegalovirus data can reliably predict human-cytomegalovirus PPIs, indicating that our method can comprehensively capture multi-modal fusion features of protein interactions across various herpesvirus subtypes. The implementation of our method is available at https://github.com/XiaodiYangpku/MultimodalPPI/.
引用
收藏
页数:13
相关论文
共 58 条
[11]   The structural basis of herpesvirus entry [J].
Connolly, Sarah A. ;
Jardetzky, Theodore S. ;
Longnecker, Richard .
NATURE REVIEWS MICROBIOLOGY, 2021, 19 (02) :110-121
[12]   Global Mapping of Herpesvirus-Host Protein Complexes Reveals a Transcription Strategy for Late Genes [J].
Davis, Zoe H. ;
Verschueren, Erik ;
Jang, Gwendolyn M. ;
Kleffman, Kevin ;
Johnson, Jeffrey R. ;
Park, Jimin ;
Von Dollen, John ;
Maher, M. Cyrus ;
Johnson, Tasha ;
Newton, William ;
Jaeger, Stefanie ;
Shales, Michael ;
Horner, Julie ;
Hernandez, Ryan D. ;
Krogan, Nevan J. ;
Glaunsinger, Britt A. .
MOLECULAR CELL, 2015, 57 (02) :349-360
[13]   The IntAct database: efficient access to fine-grained molecular interaction data [J].
del Toro, Noemi ;
Shrivastava, Anjali ;
Ragueneau, Eliot ;
Meldal, Birgit ;
Combe, Colin ;
Barrera, Elisabet ;
Perfetto, Livia ;
How, Karyn ;
Ratan, Prashansa ;
Shirodkar, Gautam ;
Lu, Odilia ;
Meszaros, Balint ;
Watkins, Xavier ;
Pundir, Sangya ;
Licata, Luana ;
Iannuccelli, Marta ;
Pellegrini, Matteo ;
Martin, Maria Jesus ;
Panni, Simona ;
Duesbury, Margaret ;
Vallet, Sylvain D. ;
Rappsilber, Juri ;
Ricard-Blum, Sylvie ;
Cesareni, Gianni ;
Salwinski, Lukasz ;
Orchard, Sandra ;
Porras, Pablo ;
Panneerselvam, Kalpana ;
Hermjakob, Henning .
NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) :D648-D653
[14]   Regulation of EBNA1 protein stability and DNA replication activity by PLOD1 lysine hydroxylase [J].
Dheekollu, Jayaraju ;
Wiedmer, Andreas ;
Soldan, Samantha S. ;
Castro-Munoz, Leonardo Josue ;
Chen, Christopher ;
Tang, Hsin-Yao ;
Speicher, David W. ;
Lieberman, Paul M. .
PLOS PATHOGENS, 2023, 19 (06)
[15]   Human herpesvirus-6 and-7 in transplantation [J].
Dockrell, DH ;
Paya, CV .
REVIEWS IN MEDICAL VIROLOGY, 2001, 11 (01) :23-36
[16]   DeNovo: virus-host sequence-based protein-protein interaction prediction [J].
Eid, Fatma-Elzahraa ;
ElHefnawi, Mahmoud ;
Heath, Lenwood S. .
BIOINFORMATICS, 2016, 32 (08) :1144-1150
[17]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[18]   CD-HIT: accelerated for clustering the next-generation sequencing data [J].
Fu, Limin ;
Niu, Beifang ;
Zhu, Zhengwei ;
Wu, Sitao ;
Li, Weizhong .
BIOINFORMATICS, 2012, 28 (23) :3150-3152
[19]   Pathogenesis of human cytomegalovirus in the immunocompromised host [J].
Griffiths, Paul ;
Reeves, Matthew .
NATURE REVIEWS MICROBIOLOGY, 2021, 19 (12) :759-773
[20]   A Systematic Analysis of Host Factors Reveals a Med23-Interferon-λ l Regulatory Axis against Herpes Simplex Virus Type 1 Replication [J].
Griffiths, Samantha J. ;
Koegl, Manfred ;
Boutell, Chris ;
Zenner, Helen L. ;
Crump, Colin M. ;
Pica, Francesca ;
Gonzalez, Orland ;
Friedel, Caroline C. ;
Barry, Gerald ;
Martin, Kim ;
Craigon, Marie H. ;
Chen, Rui ;
Kaza, Lakshmi N. ;
Fossum, Even ;
Fazakerley, John K. ;
Efstathiou, Stacey ;
Volpi, Antonio ;
Zimmer, Ralf ;
Ghazal, Peter ;
Haas, Juergen .
PLOS PATHOGENS, 2013, 9 (08)