Multi-modal features-based human-herpesvirus protein-protein interaction prediction by using LightGBM

被引:0
作者
Yang, Xiaodi [1 ]
Wuchty, Stefan [2 ,3 ,4 ]
Liang, Zeyin [1 ]
Ji, Li [1 ]
Wang, Bingjie [1 ]
Zhu, Jialin [1 ]
Zhang, Ziding [5 ]
Dong, Yujun [1 ]
机构
[1] Peking Univ First Hosp, Dept Hematol, Beijing 100034, Peoples R China
[2] Univ Miami, Dept Comp Sci, Miami, FL USA
[3] Univ Miami, Inst Data Sci, Miami, FL USA
[4] Univ Miami, Sylvester Comprehens Canc Ctr, Miami, FL USA
[5] China Agr Univ, Coll Biol Sci, Beijing 100193, Peoples R China
基金
中国国家自然科学基金;
关键词
human-herpesvirus interaction; protein-protein interaction; multi-modal; embedding; LightGBM; prediction; VIRUS; NETWORK; DATABASE; GENOMES;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The identification of human-herpesvirus protein-protein interactions (PPIs) is an essential and important entry point to understand the mechanisms of viral infection, especially in malignant tumor patients with common herpesvirus infection. While natural language processing (NLP)-based embedding techniques have emerged as powerful approaches, the application of multi-modal embedding feature fusion to predict human-herpesvirus PPIs is still limited. Here, we established a multi-modal embedding feature fusion-based LightGBM method to predict human-herpesvirus PPIs. In particular, we applied document and graph embedding approaches to represent sequence, network and function modal features of human and herpesviral proteins. Training our LightGBM models through our compiled non-rigorous and rigorous benchmarking datasets, we obtained significantly better performance compared to individual-modal features. Furthermore, our model outperformed traditional feature encodings-based machine learning methods and state-of-the-art deep learning-based methods using various benchmarking datasets. In a transfer learning step, we show that our model that was trained on human-herpesvirus PPI dataset without cytomegalovirus data can reliably predict human-cytomegalovirus PPIs, indicating that our method can comprehensively capture multi-modal fusion features of protein interactions across various herpesvirus subtypes. The implementation of our method is available at https://github.com/XiaodiYangpku/MultimodalPPI/.
引用
收藏
页数:13
相关论文
共 58 条
[1]   HPIDB 2.0: a curated database for host-pathogen interactions [J].
Ammari, Mais G. ;
Gresham, Cathy R. ;
McCarthy, Fiona M. ;
Nanduri, Bindu .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
[2]  
[Anonymous], 2023, Nucleic Acids Res, V51, pD523, DOI DOI 10.1093/NAR/GKAC1052
[3]  
Arvin A, 2007, HUMAN HERPESVIRUSES: BIOLOGY, THERAPY, AND IMMUNOPROPHYLAXIS, P700
[4]   Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics [J].
Asgari, Ehsaneddin ;
Mofrad, Mohammad R. K. .
PLOS ONE, 2015, 10 (11)
[5]   Cytomegalovirus (CMV) DNA Quantitation in Bronchoalveolar Lavage Fluid From Hematopoietic Stem Cell Transplant Recipients With CMV Pneumonia [J].
Boeckh, Michael ;
Stevens-Ayers, Terry ;
Travi, Giovanna ;
Huang, Meei-Li ;
Cheng, Guang-Shing ;
Xie, Hu ;
Leisenring, Wendy ;
Erard, Veronique ;
Seo, Sachiko ;
Kimball, Louise ;
Corey, Lawrence ;
Pergam, Steven A. ;
Jerome, Keith R. .
JOURNAL OF INFECTIOUS DISEASES, 2017, 215 (10) :1514-1522
[6]   Spatially resolved protein map of intact human cytomegalovirus virions [J].
Bogdanow, Boris ;
Gruska, Iris ;
Muehlberg, Lars ;
Protze, Jonas ;
Hohensee, Svea ;
Vetter, Barbara ;
Bosse, Jens B. ;
Lehmann, Martin ;
Sadeghi, Mohsen ;
Wiebusch, Lueder ;
Liu, Fan .
NATURE MICROBIOLOGY, 2023, 8 (9) :1732-1747
[7]   VirusMentha: a new resource for virus-host protein interactions [J].
Calderone, Alberto ;
Licata, Luana ;
Cesareni, Gianni .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D588-D592
[8]   Epstein-Barr virus and virus human protein interaction maps [J].
Calderwood, Michael A. ;
Venkatesan, Kavitha ;
Xing, Li ;
Chase, Michael R. ;
Vazquez, Alexel ;
Holthaus, Amy M. ;
Ewence, Alexandra E. ;
Li, Ning ;
Hirozane-Kishikawa, Tomoko ;
Hill, David E. ;
Vidal, Marc ;
Kieff, Elliott ;
Johannsen, Eric .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (18) :7606-7611
[9]   The Gene Ontology Resource: 20 years and still GOing strong [J].
Carbon, S. ;
Douglass, E. ;
Dunn, N. ;
Good, B. ;
Harris, N. L. ;
Lewis, S. E. ;
Mungall, C. J. ;
Basu, S. ;
Chisholm, R. L. ;
Dodson, R. J. ;
Hartline, E. ;
Fey, P. ;
Thomas, P. D. ;
Albou, L. P. ;
Ebert, D. ;
Kesling, M. J. ;
Mi, H. ;
Muruganujian, A. ;
Huang, X. ;
Poudel, S. ;
Mushayahama, T. ;
Hu, J. C. ;
LaBonte, S. A. ;
Siegele, D. A. ;
Antonazzo, G. ;
Attrill, H. ;
Brown, N. H. ;
Fexova, S. ;
Garapati, P. ;
Jones, T. E. M. ;
Marygold, S. J. ;
Millburn, G. H. ;
Rey, A. J. ;
Trovisco, V. ;
dos Santos, G. ;
Emmert, D. B. ;
Falls, K. ;
Zhou, P. ;
Goodman, J. L. ;
Strelets, V. B. ;
Thurmond, J. ;
Courtot, M. ;
Osumi-Sutherland, D. ;
Parkinson, H. ;
Roncaglia, P. ;
Acencio, M. L. ;
Kuiper, M. ;
Laegreid, A. ;
Logie, C. ;
Lovering, R. C. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D330-D338
[10]   Identification of Entry Factors Involved in Hepatitis C Virus Infection Based on Host-Mimicking Short Linear Motifs [J].
Chiang, Austin W. T. ;
Wu, Walt Y. L. ;
Wang, Ting ;
Hwang, Ming-Jing .
PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (01)