Predicting liver cancer on epigenomics data using machine learning

被引:0
作者
Vekariya, Vishalkumar [1 ]
Passi, Kalpdrum [1 ]
Jain, Chakresh Kumar [2 ]
机构
[1] Laurentian Univ, Sch Engn & Comp Sci, Sudbury, ON, Canada
[2] Jaypee Inst Informat Technol, Dept Biotechnol, Noida, India
来源
FRONTIERS IN BIOINFORMATICS | 2022年 / 2卷
关键词
epigenomics; histone; DNA methylation; human genome; RNA; BETA-CATENIN;
D O I
10.3389/fbinf.2022.954529
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Epigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the cancer cells, which is the only cause of cancer. The liver is the metabolic cleansing center of the human body and the only organ, which can regenerate itself, but liver cancer can stop the cleansing of the body. Machine learning techniques are used in this research to predict the gene expression of the liver cells for the liver hepatocellular carcinoma (LIHC), which is the third biggest reason of death by cancer and affects five hundred thousand people per year. The data for LIHC include four different types, namely, methylation, histone, the human genome, and RNA sequences. The data were accessed through open-source technologies in R programming languages for The Cancer Genome Atlas (TCGA). The proposed method considers 1,000 features across the four types of data. Nine different feature selection methods were used and eight different classification methods were compared to select the best model over 5-fold cross-validation and different training-to-test ratios. The best model was obtained for 140 features for ReliefF feature selection and XGBoost classification method with an AUC of 1.0 and an accuracy of 99.67% to predict the liver cancer.
引用
收藏
页数:14
相关论文
共 24 条
[1]   TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data [J].
Colaprico, Antonio ;
Silva, Tiago C. ;
Olsen, Catharina ;
Garofano, Luciano ;
Cava, Claudia ;
Garolini, Davide ;
Sabedot, Thais S. ;
Malta, Tathiane M. ;
Pagnotta, Stefano M. ;
Castiglioni, Isabella ;
Ceccarelli, Michele ;
Bontempi, Gianluca ;
Noushmehr, Houtan .
NUCLEIC ACIDS RESEARCH, 2016, 44 (08) :e71
[2]   Liver-targeted disruption of Apc in mice activates β-catenin signaling and leads to hepatocellular carcinomas [J].
Colnot, S ;
Decaens, T ;
Niwa-Kawakita, M ;
Godard, C ;
Hamard, G ;
Kahn, A ;
Giovannini, M ;
Perret, C .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (49) :17216-17221
[3]  
Frank E., 2016, WEKA WORKBENCH ONLIN, V3rd ed.
[4]  
Inagawa S, 2002, CLIN CANCER RES, V8, P450
[5]   ANOVA and the variance homogeneity assumption: Exploring a better gatekeeper [J].
Kim, Yoosun Jamie ;
Cribbie, Robert A. .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2018, 71 (01) :1-12
[6]  
Kosinski M., 2022, RTCGA: The Cancer Genome Atlas Data Integration
[7]   Using epigenomics data to predict gene expression in lung cancer [J].
Li, Jeffery ;
Ching, Travers ;
Huang, Sijia ;
Garmire, Lana X. .
BMC BIOINFORMATICS, 2015, 16
[8]   The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads [J].
Liao, Yang ;
Smyth, Gordon K. ;
Shi, Wei .
NUCLEIC ACIDS RESEARCH, 2019, 47 (08)
[9]   Sorafenib in advanced hepatocellular carcinoma [J].
Llovet, Josep M. ;
Ricci, Sergio ;
Mazzaferro, Vincenzo ;
Hilgard, Philip ;
Gane, Edward ;
Blanc, Jean-Frederic ;
Cosme de Oliveira, Andre ;
Santoro, Armando ;
Raoul, Jean-Luc ;
Forner, Alejandro ;
Schwartz, Myron ;
Porta, Camillo ;
Zeuzem, Stefan ;
Bolondi, Luigi ;
Greten, Tim F. ;
Galle, Peter R. ;
Seitz, Jean-Francois ;
Borbath, Ivan ;
Haussinger, Dieter ;
Giannaris, Tom ;
Shan, Minghua ;
Moscovici, Marius ;
Voliotis, Dimitris ;
Bruix, Jordi .
NEW ENGLAND JOURNAL OF MEDICINE, 2008, 359 (04) :378-390
[10]   Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 [J].
Love, Michael I. ;
Huber, Wolfgang ;
Anders, Simon .
GENOME BIOLOGY, 2014, 15 (12)