A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data

被引:70
作者
Xiao, Yawen [1 ,2 ]
Wu, Jun [3 ,4 ]
Lin, Zongli [5 ]
Zhao, Xiaodong [6 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
[2] Minist Educ, Key Lab Syst Control & Informat Proc, Shanghai 200240, Peoples R China
[3] East China Normal Univ, Ctr Bioinformat & Computat Biol, Shanghai Key Lab Regulatory Biol, Inst Biomed Sci, Shanghai 200241, Peoples R China
[4] East China Normal Univ, Sch Life Sci, Shanghai 200241, Peoples R China
[5] Univ Virginia, Charles L Brown Dept Elect & Comp Engn, POB 400743, Charlottesville, VA 22904 USA
[6] Shanghai Jiao Tong Univ, Sch Biomed Engn, Shanghai 200240, Peoples R China
关键词
Stacked sparse auto-encoder; Cancer prediction; Gene expression data; Semi-supervised learning; Deep learning; FEATURE-SELECTION; MACHINE; AUTOENCODER; DIAGNOSIS; PROGNOSIS;
D O I
10.1016/j.cmpb.2018.10.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and objective: Cancer has become a complex health problem due to its high mortality. Over the past few decades, with the rapid development of the high-throughput sequencing technology and the application of various machine learning methods, remarkable progress in cancer research has been made based on gene expression data. At the same time, a growing amount of high-dimensional data has been generated, such as RNA-seq data, which calls for superior machine learning methods able to deal with mass data effectively in order to make accurate treatment decision. Methods: In this paper, we present a semi-supervised deep learning strategy, the stacked sparse auto-encoder (SSAE) based classification, for cancer prediction using RNA-seq data. The proposed SSAE based method employs the greedy layer-wise pre-training and a sparsity penalty term to help capture and extract important information from the high-dimensional data and then classify the samples. Results: We tested the proposed SSAE model on three public RNA-seq data sets of three types of cancers and compared the prediction performance with several commonly-used classification methods. The results indicate that our approach outperforms the other methods for all the three cancer data sets in various metrics. Conclusions: The proposed SSAE based semi-supervised deep learning model shows its promising ability to process high-dimensional gene expression data and is proved to be effective and accurate for cancer prediction. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:99 / 105
页数:7
相关论文
共 34 条
  • [1] Support vector machines combined with feature selection for breast cancer diagnosis
    Akay, Mehmet Fatih
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 3240 - 3247
  • [2] Alireza O, 2010, P 5 INT S HLTH INF B, P114, DOI DOI 10.1109/HIBIT.2010.5478895
  • [3] Anders S., 2010, GENOME BIOL, V11, pR106, DOI [10.1186/gb-2010-11-10-r106, DOI 10.1186/gb-2010-11-10-r106]
  • [4] Semi-supervised SVM-based Feature Selection for Cancer Classification using Microarray Gene Expression Data
    Ang, Jun Chin
    Haron, Habibollah
    Hamed, Haza Nuzly Abdull
    [J]. CURRENT APPROACHES IN APPLIED ARTIFICIAL INTELLIGENCE, 2015, 9101 : 468 - 477
  • [5] [Anonymous], 2013, INT C MACH LEARN
  • [6] [Anonymous], THE TCGA DATABASE
  • [7] [Anonymous], IET BIOMETRICS
  • [8] [Anonymous], P INT C ADV COMM NET
  • [9] [Anonymous], GENET EPIDEMIOL
  • [10] Bal Manjit Singh, 2015, Asian Pac J Cancer Prev, V16, P5107