A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data

被引:70
作者
Xiao, Yawen [1 ,2 ]
Wu, Jun [3 ,4 ]
Lin, Zongli [5 ]
Zhao, Xiaodong [6 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
[2] Minist Educ, Key Lab Syst Control & Informat Proc, Shanghai 200240, Peoples R China
[3] East China Normal Univ, Ctr Bioinformat & Computat Biol, Shanghai Key Lab Regulatory Biol, Inst Biomed Sci, Shanghai 200241, Peoples R China
[4] East China Normal Univ, Sch Life Sci, Shanghai 200241, Peoples R China
[5] Univ Virginia, Charles L Brown Dept Elect & Comp Engn, POB 400743, Charlottesville, VA 22904 USA
[6] Shanghai Jiao Tong Univ, Sch Biomed Engn, Shanghai 200240, Peoples R China
关键词
Stacked sparse auto-encoder; Cancer prediction; Gene expression data; Semi-supervised learning; Deep learning; FEATURE-SELECTION; MACHINE; AUTOENCODER; DIAGNOSIS; PROGNOSIS;
D O I
10.1016/j.cmpb.2018.10.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and objective: Cancer has become a complex health problem due to its high mortality. Over the past few decades, with the rapid development of the high-throughput sequencing technology and the application of various machine learning methods, remarkable progress in cancer research has been made based on gene expression data. At the same time, a growing amount of high-dimensional data has been generated, such as RNA-seq data, which calls for superior machine learning methods able to deal with mass data effectively in order to make accurate treatment decision. Methods: In this paper, we present a semi-supervised deep learning strategy, the stacked sparse auto-encoder (SSAE) based classification, for cancer prediction using RNA-seq data. The proposed SSAE based method employs the greedy layer-wise pre-training and a sparsity penalty term to help capture and extract important information from the high-dimensional data and then classify the samples. Results: We tested the proposed SSAE model on three public RNA-seq data sets of three types of cancers and compared the prediction performance with several commonly-used classification methods. The results indicate that our approach outperforms the other methods for all the three cancer data sets in various metrics. Conclusions: The proposed SSAE based semi-supervised deep learning model shows its promising ability to process high-dimensional gene expression data and is proved to be effective and accurate for cancer prediction. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:99 / 105
页数:7
相关论文
共 34 条
  • [21] Lee H., 2006, ADV NEURAL INF PROCE, P801
  • [22] Circular RNA is enriched and stable in exosomes: a promising biomarker for cancer diagnosis
    Li, Yan
    Zheng, Qiupeng
    Bao, Chunyang
    Li, Shuyi
    Guo, Weijie
    Zhao, Jiang
    Chen, Di
    Gu, Jianren
    He, Xianghuo
    Huang, Shenglin
    [J]. CELL RESEARCH, 2015, 25 (08) : 981 - 984
  • [23] Applications of Deep Learning in Biomedicine
    Mamoshina, Polina
    Vieira, Armando
    Putin, Evgeny
    Zhavoronkov, Alex
    [J]. MOLECULAR PHARMACEUTICS, 2016, 13 (05) : 1445 - 1454
  • [24] Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks
    Oquab, Maxime
    Bottou, Leon
    Laptev, Ivan
    Sivic, Josef
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1717 - 1724
  • [25] Semi-supervised learning improves gene expression-based prediction of cancer recurrence
    Shi, Mingguang
    Zhang, Bing
    [J]. BIOINFORMATICS, 2011, 27 (21) : 3017 - 3023
  • [26] A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
    Statnikov, Alexander
    Wang, Lily
    Aliferis, Constantin F.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [27] Sutskever I, 2014, ADV NEUR IN, V27
  • [28] van der Maaten L, 2008, J MACH LEARN RES, V9, P2579
  • [29] A system level analysis of gastric cancer across tumor stages with RNA-seq data
    Wu, Jun
    Zhao, Xiaodong
    Lin, Zongli
    Shao, Zhifeng
    [J]. MOLECULAR BIOSYSTEMS, 2015, 11 (07) : 1925 - 1932
  • [30] A deep learning-based multi-model ensemble method for cancer prediction
    Xiao, Yawen
    Wu, Jun
    Lin, Zongli
    Zhao, Xiaodong
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2018, 153 : 1 - 9