Prediction of viral oncoproteins through the combination of generative adversarial networks and machine learning techniques

被引:0
作者
Beltran, Jorge F. [1 ]
Herrera-Belen, Lisandra [2 ]
Yanez, Alejandro J. [3 ,4 ]
Jimenez, Luis [1 ]
机构
[1] Univ La Frontera, Fac Engn & Sci, Dept Chem Engn, Ave Francisco Salazar 01145, Temuco, Chile
[2] Univ Santo Tomas, Fac Ciencias, Dept Ciencias Basicas, Temuco, Chile
[3] Greenvolution SpA, Dept Invest & Desarrollo, Puerto Varas, Chile
[4] Interdisciplinary Ctr Aquaculture Res INCAR, Concepcion, Chile
关键词
Virus; Oncoprotein; Machine learning; Deep learning; Random forest; Multilayer perceptron; HUMAN-PAPILLOMAVIRUS;
D O I
10.1038/s41598-024-77028-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Viral oncoproteins play crucial roles in transforming normal cells into cancer cells, representing a significant factor in the etiology of various cancers. Traditionally, identifying these oncoproteins is both time-consuming and costly. With advancements in computational biology, bioinformatics tools based on machine learning have emerged as effective methods for predicting biological activities. Here, for the first time, we propose an innovative approach that combines Generative Adversarial Networks (GANs) with supervised learning methods to enhance the accuracy and generalizability of viral oncoprotein prediction. Our methodology evaluated multiple machine learning models, including Random Forest, Multilayer Perceptron, Light Gradient Boosting Machine, eXtreme Gradient Boosting, and Support Vector Machine. In ten-fold cross-validation on our training dataset, the GAN-enhanced Random Forest model demonstrated superior performance metrics: 0.976 accuracy, 0.976 F1 score, 0.977 precision, 0.976 sensitivity, and 1.0 AUC. During independent testing, this model achieved 0.982 accuracy, 0.982 F1 score, 0.982 precision, 0.982 sensitivity, and 1.0 AUC. These results establish our new tool, VirOncoTarget, accessible via a web application. We anticipate that VirOncoTarget will be a valuable resource for researchers, enabling rapid and reliable viral oncoprotein prediction and advancing our understanding of their role in cancer biology.
引用
收藏
页数:11
相关论文
共 82 条
[1]   Leveraging deep learning algorithms for synthetic data generation to design and analyze biological networks [J].
Achuthan, Srisairam ;
Chatterjee, Rishov ;
Kotnala, Sourabh ;
Mohanty, Atish ;
Bhattacharya, Supriyo ;
Salgia, Ravi ;
Kulkarni, Prakash .
JOURNAL OF BIOSCIENCES, 2022, 47 (03)
[2]   Merkel Cell Polyomavirus: Oncogenesis in a Stable Genome [J].
Ahmed, Mona M. ;
Cushman, Camille H. ;
DeCaprio, James A. .
VIRUSES-BASEL, 2022, 14 (01)
[3]   iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks [J].
Akbar, Shahid ;
Zou, Quan ;
Raza, Ali ;
Alarfaj, Fawaz Khaled .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 151
[4]   Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model [J].
Akbar, Shahid ;
Raza, Ali ;
Zou, Quan .
BMC BIOINFORMATICS, 2024, 25 (01)
[5]   Oncogenic Role of Tumor Viruses in Humans [J].
Akram, Nimrah ;
Imran, Muhammad ;
Noreen, Mamoona ;
Ahmed, Fayyaz ;
Atif, Muhammad ;
Fatima, Zareen ;
Waqar, Ahmed Bilal .
VIRAL IMMUNOLOGY, 2017, 30 (01) :20-27
[6]   DeepLoc: prediction of protein subcellular localization using deep learning (vol 33, pg 3387, 2017) [J].
Armenteros, Jose Juan Almagro ;
Sonderby, Casper Kaae ;
Sonderby, Soren Kaae ;
Nielsen, Henrik ;
Winther, Ole .
BIOINFORMATICS, 2017, 33 (24) :4049-4049
[7]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[8]   MultiToxPred 1.0: a novel comprehensive tool for predicting 27 classes of protein toxins using an ensemble machine learning approach [J].
Beltran, Jorge F. ;
Herrera-Belen, Lisandra ;
Parraguez-Contreras, Fernanda ;
Farias, Jorge G. ;
Machuca-Sepulveda, Jorge ;
Short, Stefania .
BMC BIOINFORMATICS, 2024, 25 (01)
[9]   VirusHound-I: prediction of viral proteins involved in the evasion of host adaptive immune response using the random forest algorithm and generative adversarial network for data augmentation [J].
Beltran, Jorge F. ;
Belen, Lisandra Herrera ;
Farias, Jorge G. ;
Zamorano, Mauricio ;
Lefin, Nicolas ;
Miranda, Javiera ;
Parraguez-Contreras, Fernanda .
BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)
[10]   PPLK+C: A Bioinformatics Tool for Predicting Peptide Ligands of Potassium Channels Based on Primary Structure Information [J].
Beltran Lissabet, Jorge Felix ;
Herrera Belen, Lisandra ;
Farias, Jorge G. .
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2020, 12 (03) :258-263