Mining influential genes based on deep learning

被引:10
作者
Kong, Lingpeng [1 ]
Chen, Yuanyuan [2 ]
Xu, Fengjiao [2 ]
Xu, Mingmin [1 ]
Li, Zutan [1 ]
Fang, Jingya [1 ]
Zhang, Liangyun [2 ]
Pian, Cong [2 ]
机构
[1] Nanjing Agr Univ, Coll Agr, Nanjing 210095, Jiangsu, Peoples R China
[2] Nanjing Agr Univ, Coll Sci, Dept Math, Nanjing 210095, Peoples R China
关键词
Landmark genes; Deep learning; AutoEncoder; DeepLIFT; CONNECTIVITY MAP; EXPRESSION; REPRESENTATION;
D O I
10.1186/s12859-021-03972-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of similar to 1000 carefully chosen landmark genes that can capture similar to 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. Results: Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information. Conclusions: We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.
引用
收藏
页数:12
相关论文
共 37 条
[1]  
[Anonymous], 2016, J Paramedical Sci (JPS)
[2]  
[Anonymous], 2014, INT C LEARN REPR WOR
[3]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis [J].
Brunel, Helena ;
Gallardo-Chacon, Joan-Josep ;
Buil, Alfonso ;
Vallverdu, Montserrat ;
Manuel Soria, Jose ;
Caminal, Pere ;
Perera, Alexandre .
BIOINFORMATICS, 2010, 26 (15) :1811-1818
[6]   Stromal gene expression defines poor-prognosis subtypes in colorectal cancer [J].
Calon, Alexandre ;
Lonardo, Enza ;
Berenguer-Llergo, Antonio ;
Espinet, Elisa ;
Hernando-Momblona, Xavier ;
Iglesias, Mar ;
Sevillano, Marta ;
Palomo-Ponce, Sergio ;
Tauriello, Daniele V. F. ;
Byrom, Daniel ;
Cortina, Carme ;
Morral, Clara ;
Barcelo, Carles ;
Tosi, Sebastien ;
Riera, Antoni ;
Attolini, Camille Stephan-Otto ;
Rossell, David ;
Sancho, Elena ;
Batlle, Eduard .
NATURE GENETICS, 2015, 47 (04) :320-U62
[7]   Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer [J].
Chaudharyl, Kumardeep ;
Poirionl, Olivier B. ;
Lu, Liangqun ;
Garmire, Lana X. .
CLINICAL CANCER RESEARCH, 2018, 24 (06) :1248-1259
[8]   Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model [J].
Chen, Lujia ;
Cai, Chunhui ;
Chen, Vicky ;
Lu, Xinghua .
BMC BIOINFORMATICS, 2016, 17
[9]  
Chen QJ, 2016, AAAI CONF ARTIF INTE, P338
[10]   i6mA-Pred: identifying DNA N6 - methyladenine sites in the rice genome [J].
Chen, Wei ;
Lv, Hao ;
Nie, Fulei ;
Lin, Hao .
BIOINFORMATICS, 2019, 35 (16) :2796-2800