GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

被引:37
作者
Chen, Hung-I Harry [1 ,2 ]
Chiu, Yu-Chiao [2 ]
Zhang, Tinghe [1 ]
Zhang, Songyao [1 ,4 ]
Huang, Yufei [1 ]
Chen, Yidong [2 ,3 ]
机构
[1] Univ Texas San Antonio, Dept Elect & Comp Engn, San Antonio, TX 78249 USA
[2] Univ Texas Hlth Sci Ctr San Antonio, Greehey Childrens Canc Res Inst, San Antonio, TX 78229 USA
[3] Univ Texas Hlth Sci Ctr San Antonio, Dept Epidemiol & Biostat, San Antonio, TX 78229 USA
[4] Northwestern Polytech Univ, Lab Informat Fus Technol, Minist Educ, Sch Automat, Xian 710072, Shaanxi, Peoples R China
关键词
Gene superset analysis; Survival analysis; Deep learning; Autoencoder; CELL LUNG-CANCER; NONSMALL CELL; ENRICHMENT ANALYSIS; POOR-PROGNOSIS; EXPRESSION; SURVIVAL; RECEPTOR; GROWTH; ATLAS;
D O I
10.1186/s12918-018-0642-2
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
BackgroundBioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set based analyses improve the biologists' capability to discover functional relevance of their experiment design. While elucidating gene set individually, inter-gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets.ResultsIn this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets' ability of discriminating tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets.ConclusionsUsing autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.
引用
收藏
页数:13
相关论文
共 47 条
[1]   Sensitive detection of rare disease-associated cell subsets via representation learning [J].
Arvaniti, Eirini ;
Claassen, Manfred .
NATURE COMMUNICATIONS, 2017, 8
[2]   Carfilzomib demonstrates broad anti-tumor activity in pre-clinical non-small cell and small cell lung cancer models [J].
Baker, Amanda F. ;
Hanke, Neale T. ;
Sands, Barbara J. ;
Carbajal, Liliana ;
Anderl, Janet L. ;
Garland, Linda L. .
JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH, 2014, 33
[3]   Characterizing gene sets with FuncAssociate [J].
Berriz, GF ;
King, OD ;
Bryant, B ;
Sander, C ;
Roth, FP .
BIOINFORMATICS, 2003, 19 (18) :2502-2504
[4]  
Campello Ricardo J. G. B., 2013, Advances in Knowledge Discovery and Data Mining. 17th Pacific-Asia Conference (PAKDD 2013). Proceedings, P160, DOI 10.1007/978-3-642-37456-2_14
[5]   Structural Basis for Substrate-specific Acetylation of N-αacetyltransferase Ard1 from Sulfolobus solfataricus [J].
Chang, Yu-Yung ;
Hsu, Chun-Hua .
SCIENTIFIC REPORTS, 2015, 5
[6]   Detection of high variability in gene expression from single-cell RNA-seq profiling [J].
Chen, Hung-I Harry ;
Jin, Yufang ;
Huang, Yufei ;
Chen, Yidong .
BMC GENOMICS, 2016, 17
[7]   ToppGene Suite for gene list enrichment analysis and candidate gene prioritization [J].
Chen, Jing ;
Bardes, Eric E. ;
Aronow, Bruce J. ;
Jegga, Anil G. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W305-W311
[8]   Gene expression inference with deep learning [J].
Chen, Yifei ;
Li, Yi ;
Narayan, Rajiv ;
Subramanian, Aravind ;
Xie, Xiaohui .
BIOINFORMATICS, 2016, 32 (12) :1832-1839
[9]   An estrogen receptor-negative breast cancer subset characterized by a hormonally regulated transcriptional program and response to androgen [J].
Doane, A. S. ;
Danso, M. ;
Lal, P. ;
Donaton, M. ;
Zhang, L. ;
Hudis, C. ;
Gerald, W. L. .
ONCOGENE, 2006, 25 (28) :3994-4008
[10]   Identification of molecular apocrine breast tumours by microarray analysis [J].
Farmer, P ;
Bonnefoi, H ;
Becette, V ;
Tubiana-Hulin, M ;
Fumoleau, P ;
Larsimont, D ;
MacGrogan, G ;
Bergh, J ;
Cameron, D ;
Goldstein, D ;
Duss, S ;
Nicoulaz, AL ;
Brisken, C ;
Fiche, M ;
Delorenzi, M ;
Iggo, R .
ONCOGENE, 2005, 24 (29) :4660-4671