A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data

被引:70
作者
Xiao, Yawen [1 ,2 ]
Wu, Jun [3 ,4 ]
Lin, Zongli [5 ]
Zhao, Xiaodong [6 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
[2] Minist Educ, Key Lab Syst Control & Informat Proc, Shanghai 200240, Peoples R China
[3] East China Normal Univ, Ctr Bioinformat & Computat Biol, Shanghai Key Lab Regulatory Biol, Inst Biomed Sci, Shanghai 200241, Peoples R China
[4] East China Normal Univ, Sch Life Sci, Shanghai 200241, Peoples R China
[5] Univ Virginia, Charles L Brown Dept Elect & Comp Engn, POB 400743, Charlottesville, VA 22904 USA
[6] Shanghai Jiao Tong Univ, Sch Biomed Engn, Shanghai 200240, Peoples R China
关键词
Stacked sparse auto-encoder; Cancer prediction; Gene expression data; Semi-supervised learning; Deep learning; FEATURE-SELECTION; MACHINE; AUTOENCODER; DIAGNOSIS; PROGNOSIS;
D O I
10.1016/j.cmpb.2018.10.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and objective: Cancer has become a complex health problem due to its high mortality. Over the past few decades, with the rapid development of the high-throughput sequencing technology and the application of various machine learning methods, remarkable progress in cancer research has been made based on gene expression data. At the same time, a growing amount of high-dimensional data has been generated, such as RNA-seq data, which calls for superior machine learning methods able to deal with mass data effectively in order to make accurate treatment decision. Methods: In this paper, we present a semi-supervised deep learning strategy, the stacked sparse auto-encoder (SSAE) based classification, for cancer prediction using RNA-seq data. The proposed SSAE based method employs the greedy layer-wise pre-training and a sparsity penalty term to help capture and extract important information from the high-dimensional data and then classify the samples. Results: We tested the proposed SSAE model on three public RNA-seq data sets of three types of cancers and compared the prediction performance with several commonly-used classification methods. The results indicate that our approach outperforms the other methods for all the three cancer data sets in various metrics. Conclusions: The proposed SSAE based semi-supervised deep learning model shows its promising ability to process high-dimensional gene expression data and is proved to be effective and accurate for cancer prediction. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:99 / 105
页数:7
相关论文
共 34 条
  • [11] Bashiri A, 2017, IRAN J PUBLIC HEALTH, V46, P165
  • [12] Comprehensive molecular characterization of gastric adenocarcinoma
    Bass, Adam J.
    Thorsson, Vesteinn
    Shmulevich, Ilya
    Reynolds, Sheila M.
    Miller, Michael
    Bernard, Brady
    Hinoue, Toshinori
    Laird, Peter W.
    Curtis, Christina
    Shen, Hui
    Weisenberger, Daniel J.
    Schultz, Nikolaus
    Shen, Ronglai
    Weinhold, Nils
    Keiser, David P.
    Bowlby, Reanne
    Sipahimalani, Payal
    Cherniack, Andrew D.
    Getz, Gad
    Liu, Yingchun
    Noble, Michael S.
    Pedamallu, Chandra
    Sougnez, Carrie
    Taylor-Weiner, Amaro
    Akbani, Rehan
    Lee, Ju-Seog
    Liu, Wenbin
    Mills, Gordon B.
    Yang, Da
    Zhang, Wei
    Pantazi, Angeliki
    Parfenov, Michael
    Gulley, Margaret
    Piazuelo, M. Blanca
    Schneider, Barbara G.
    Kim, Jihun
    Boussioutas, Alex
    Sheth, Margi
    Demchok, John A.
    Rabkin, Charles S.
    Willis, Joseph E.
    Ng, Sam
    Garman, Katherine
    Beer, David G.
    Pennathur, Arjun
    Raphael, Benjamin J.
    Wu, Hsin-Ta
    Odze, Robert
    Kim, Hark K.
    Bowen, Jay
    [J]. NATURE, 2014, 513 (7517) : 202 - 209
  • [13] Circulating miRNAs: Roles in cancer diagnosis, prognosis and therapy
    Cheng, Guofeng
    [J]. ADVANCED DRUG DELIVERY REVIEWS, 2015, 81 : 75 - 93
  • [14] Cruz JA, 2006, CANCER INFORM, V2, P59
  • [15] Danaee Padideh, 2017, Pac Symp Biocomput, V22, P219, DOI 10.1142/9789813207813_0022
  • [16] Reducing the dimensionality of data with neural networks
    Hinton, G. E.
    Salakhutdinov, R. R.
    [J]. SCIENCE, 2006, 313 (5786) : 504 - 507
  • [17] Hira Zena M., 2015, Advances in Bioinformatics, V2015, P198363, DOI 10.1155/2015/198363
  • [18] Semi-supervised fault classification based on dynamic Sparse Stacked auto-encoders model
    Jiang, Li
    Ge, Zhiqiang
    Song, Zhihuan
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2017, 168 : 72 - 83
  • [19] Machine learning applications in cancer prognosis and prediction
    Kourou, Konstantina
    Exarchos, Themis P.
    Exarchos, Konstantinos P.
    Karamouzis, Michalis V.
    Fotiadis, Dimitrios I.
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2015, 13 : 8 - 17
  • [20] Deep learning
    LeCun, Yann
    Bengio, Yoshua
    Hinton, Geoffrey
    [J]. NATURE, 2015, 521 (7553) : 436 - 444