A deep auto-encoder model for gene expression prediction

被引:71
作者
Xie, Rui [1 ]
Wen, Jia [2 ]
Quitadamo, Andrew [2 ]
Cheng, Jianlin [1 ]
Shi, Xinghua [2 ]
机构
[1] Univ Missouri, Dept Comp Sci, Columbia, MO USA
[2] Univ N Carolina, Coll Comp & Informat, Dept Bioinformat & Genom, Univ City Blvd, Charlotte, NC 28223 USA
来源
BMC GENOMICS | 2017年 / 18卷
基金
美国国家科学基金会;
关键词
Predictive model; Stacked denoising auto-encoder; Multilayer perceptron; Deep learning; Gene expression; QUANTITATIVE TRAIT LOCI; RESIDUE CONTACTS; GENOME; TRANSCRIPTOME; NETWORKS; ARCHITECTURES; NUCLEOTIDE; SEQUENCE; MAP;
D O I
10.1186/s12864-017-4226-0
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. Results: To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. Conclusion: We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.
引用
收藏
页数:11
相关论文
共 88 条
[31]   Extremely randomized trees [J].
Geurts, P ;
Ernst, D ;
Wehenkel, L .
MACHINE LEARNING, 2006, 63 (01) :3-42
[32]  
Graves A, 2013, ARXIV13080850
[33]  
Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[34]  
Hastie T., 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, P9
[35]  
Herrera RJ., 2016, Genomes, Evolution, and Culture: Past, Present, and Future of Humankind
[36]  
Hinton G. E., 2010, Momentum, P599
[37]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554
[38]   Modular network construction using eQTL data: an analysis of computational costs and benefits [J].
Ho, Yen-Yi ;
Cope, Leslie M. ;
Parmigiani, Giovanni .
FRONTIERS IN GENETICS, 2014, 5
[39]   Improving Protein Fold Recognition by Deep Learning Networks [J].
Jo, Taeho ;
Hou, Jie ;
Eickholt, Jesse ;
Cheng, Jianlin .
SCIENTIFIC REPORTS, 2015, 5
[40]   Accurate Discovery of Expression Quantitative Trait Loci Under Confounding From Spurious and Genuine Regulatory Hotspots [J].
Kang, Hyun Min ;
Ye, Chun ;
Eskin, Eleazar .
GENETICS, 2008, 180 (04) :1909-1925