Cell Identity Codes: Understanding Cell Identity from Gene Expression Profiles using Deep Neural Networks

被引:11
作者
Abdolhosseini, Farzad [1 ]
Azarkhalili, Behrooz [2 ]
Maazallahi, Abbas [1 ]
Kamal, Aryan [1 ]
Motahari, Seyed Abolfazi [1 ]
Sharifi-Zarchi, Ali [1 ]
Chitsaz, Hamidreza [3 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
[2] ACECR, Royan Inst Stem Cell Biol & Technol, Tehran, Iran
[3] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
关键词
EMBRYONIC STEM-CELLS; DIFFERENTIATION; DISSECTION; DNA;
D O I
10.1038/s41598-019-38798-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Understanding cell identity is an important task in many biomedical areas. Expression patterns of specific marker genes have been used to characterize some limited cell types, but exclusive markers are not available for many cell types. A second approach is to use machine learning to discriminate cell types based on the whole gene expression profiles (GEPs). The accuracies of simple classification algorithms such as linear discriminators or support vector machines are limited due to the complexity of biological systems. We used deep neural networks to analyze 1040 GEPs from 16 different human tissues and cell types. After comparing different architectures, we identified a specific structure of deep autoencoders that can encode a GEP into a vector of 30 numeric values, which we call the cell identity code (CIC). The original GEP can be reproduced from the CIC with an accuracy comparable to technical replicates of the same experiment. Although we use an unsupervised approach to train the autoencoder, we show different values of the CIC are connected to different biological aspects of the cell, such as different pathways or biological processes. This network can use CIC to reproduce the GEP of the cell types it has never seen during the training. It also can resist some noise in the measurement of the GEP. Furthermore, we introduce classifier autoencoder, an architecture that can accurately identify cell type based on the GEP or the CIC.
引用
收藏
页数:14
相关论文
共 39 条
  • [1] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
    Alipanahi, Babak
    Delong, Andrew
    Weirauch, Matthew T.
    Frey, Brendan J.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (08) : 831 - +
  • [2] NCBI GEO: archive for functional genomics data sets-update
    Barrett, Tanya
    Wilhite, Stephen E.
    Ledoux, Pierre
    Evangelista, Carlos
    Kim, Irene F.
    Tomashevsky, Maxim
    Marshall, Kimberly A.
    Phillippy, Katherine H.
    Sherman, Patti M.
    Holko, Michelle
    Yefanov, Andrey
    Lee, Hyeseung
    Zhang, Naigong
    Robertson, Cynthia L.
    Serova, Nadezhda
    Davis, Sean
    Soboleva, Alexandra
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D991 - D995
  • [3] Bastian M., Gephi: an Open Source Software for Exploring and Manipulating Networks, DOI 10.1609/icwsm.v3i1.13937
  • [4] Bohlin Ludvig, 2014, inMeasuringScholarlyImpact, P3
  • [5] CellNet: Network Biology Applied to Stem Cell Engineering
    Cahan, Patrick
    Li, Hu
    Morris, Samantha A.
    da Rocha, Edroaldo Lummertz
    Daley, George Q.
    Collins, James J.
    [J]. CELL, 2014, 158 (04) : 903 - 915
  • [6] Enrichr: interactive and collaborative HTML']HTML5 gene list enrichment analysis tool
    Chen, Edward Y.
    Tan, Christopher M.
    Kou, Yan
    Duan, Qiaonan
    Wang, Zichen
    Meirelles, Gabriela Vaz
    Clark, Neil R.
    Ma'ayan, Avi
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [7] Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model
    Chen, Lujia
    Cai, Chunhui
    Chen, Vicky
    Lu, Xinghua
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [8] Gene expression inference with deep learning
    Chen, Yifei
    Li, Yi
    Narayan, Rajiv
    Subramanian, Aravind
    Xie, Xiaohui
    [J]. BIOINFORMATICS, 2016, 32 (12) : 1832 - 1839
  • [9] Deng L, 2013, IEEE INT NEW CIRC
  • [10] Du T., 2016, METHODS SAN DIEGO CA