Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer

被引:11
作者
Bostanci, Erkan [1 ]
Kocak, Engin [2 ]
Unal, Metehan [1 ]
Guzel, Mehmet Serdar [1 ]
Acici, Koray [3 ]
Asuroglu, Tunc [4 ]
机构
[1] Ankara Univ, Fac Engn, Dept Comp Engn, TR-06830 Ankara, Turkiye
[2] Univ Hlth Sci, Fac Gulhane Pharm, Dept Analyt Chem, TR-06018 Ankara, Turkiye
[3] Ankara Univ, Fac Engn, Dept Artificial Intelligence & Data Engn, TR-06830 Ankara, Turkiye
[4] Tampere Univ, Fac Med & Hlth Technol, Tampere 33720, Finland
关键词
transcriptomics; RNA-seq; machine learning; deep learning; classification; cancer prediction; exRNA; CLASSIFICATION; AGREEMENT; HEALTH;
D O I
10.3390/s23063080
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and classification tasks. Integration of molecular omics data with ML algorithms has offered a great opportunity to evaluate clinical data. RNA sequence (RNA-seq) analysis has been emerged as the gold standard for transcriptomics analysis. Currently, it is being used widely in clinical research. In our present work, RNA-seq data of extracellular vesicles (EV) from healthy and colon cancer patients are analyzed. Our aim is to develop models for prediction and classification of colon cancer stages. Five different canonical ML and Deep Learning (DL) classifiers are used to predict colon cancer of an individual with processed RNA-seq data. The classes of data are formed on the basis of both colon cancer stages and cancer presence (healthy or cancer). The canonical ML classifiers, which are k-Nearest Neighbor (kNN), Logistic Model Tree (LMT), Random Tree (RT), Random Committee (RC), and Random Forest (RF), are tested with both forms of the data. In addition, to compare the performance with canonical ML models, One-Dimensional Convolutional Neural Network (1-D CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) DL models are utilized. Hyper-parameter optimizations of DL models are constructed by using genetic meta-heuristic optimization algorithm (GA). The best accuracy in cancer prediction is obtained with RC, LMT, and RF canonical ML algorithms as 97.33%. However, RT and kNN show 95.33% performance. The best accuracy in cancer stage classification is achieved with RF as 97.33%. This result is followed by LMT, RC, kNN, and RT with 96.33%, 96%, 94.66%, and 94%, respectively. According to the results of the experiments with DL algorithms, the best accuracy in cancer prediction is obtained with 1-D CNN as 97.67%. BiLSTM and LSTM show 94.33% and 93.67% performance, respectively. In classification of the cancer stages, the best accuracy is achieved with BiLSTM as 98%. 1-D CNN and LSTM show 97% and 94.33% performance, respectively. The results reveal that both canonical ML and DL models may outperform each other for different numbers of features.
引用
收藏
页数:28
相关论文
共 50 条
  • [21] NBLDA: negative binomial linear discriminant analysis for RNA-Seq data
    Dong, Kai
    Zhao, Hongyu
    Tong, Tiejun
    Wan, Xiang
    BMC BIOINFORMATICS, 2016, 17
  • [22] Emerging deep learning methods for single-cell RNA-seq data analysis
    Zheng, Jie
    Wang, Ke
    QUANTITATIVE BIOLOGY, 2019, 7 (04) : 247 - 254
  • [23] Integration of RNA-Seq and RPPA data for survival time prediction in cancer patients
    Isik, Zerrin
    Ercan, Muserref Ece
    COMPUTERS IN BIOLOGY AND MEDICINE, 2017, 89 : 397 - 404
  • [24] A Comparative Study on Classification Methods for Renal Cell and Lung Cancers Using RNA-Seq Data
    Haznedar, Bulent
    Simsek, Nihat Y.
    IEEE ACCESS, 2022, 10 : 105412 - 105420
  • [25] Cross-platform normalization of microarray and RNA-seq data for machine learning applications
    Thompson, Jeffrey A.
    Tan, Jie
    Greene, Casey S.
    PEERJ, 2016, 4
  • [26] Analyzing RNA-Seq Gene Expression Data Using Deep Learning Approaches for Cancer Classification
    Rukhsar, Laiqa
    Bangyal, Waqas Haider
    Ali Khan, Muhammad Sadiq
    Ag Ibrahim, Ag Asri
    Nisar, Kashif
    Rawat, Danda B.
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [27] Benchmarking and Testing Machine Learning Approaches with BARRA:CuRDa, a Curated RNA-Seq Database for Cancer Research
    Feltes, Bruno Cesar
    Poloni, Joice De Faria
    Dorn, Marcio
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2021, 28 (09) : 931 - 944
  • [28] Integrated COVID-19 Predictor: Differential expression analysis to reveal potential biomarkers and prediction of coronavirus using RNA-Seq profile data
    Iqbal, Naiyar
    Kumar, Pradeep
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 147
  • [29] Data on RNA-seq analysis of Drosophila melanogaster during ageing
    Bajgiran, Morteza
    Azlan, Azali
    Shamsuddin, Shaharum
    Azzam, Ghows
    Halim, Mardani Abdul
    DATA IN BRIEF, 2021, 38
  • [30] Predictive biomarkers for embryotoxicity: a machine learning approach to mitigating multicollinearity in RNA-Seq
    Quah, Yixian
    Jung, Soontag
    Chan, Jireh Yi-Le
    Ham, Onju
    Jeong, Ji-Seong
    Kim, Sangyun
    Kim, Woojin
    Park, Seung-Chun
    Lee, Seung-Jin
    Yu, Wook-Joon
    ARCHIVES OF TOXICOLOGY, 2024, 98 (12) : 4093 - 4105