Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing

被引:101
作者
Heimberg, Graham [1 ,2 ,3 ,4 ]
Bhatnagar, Rajat [1 ,3 ]
El-Samad, Hana [1 ,3 ]
Thomson, Matt [3 ,4 ]
机构
[1] Univ Calif San Francisco, Calif Inst Quantitat Biosci, Dept Biochem & Biophys, San Francisco, CA 94158 USA
[2] Univ Calif San Francisco, Integrat Program Quantitat Biol, San Francisco, CA 94158 USA
[3] Univ Calif San Francisco, Ctr Syst & Synthet Biol, San Francisco, CA 94158 USA
[4] Univ Calif San Francisco, Dept Cellular & Mol Pharmacol, San Francisco, CA 94158 USA
关键词
CELL RNA-SEQ; HETEROGENEITY; DECOMPOSITION; NETWORKS; MODULES; TISSUES;
D O I
10.1016/j.cels.2016.04.001
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A tradeoff between precision and throughput constrains all biological measurements, including sequencing-based technologies. Here, we develop a mathematical framework that defines this tradeoff between mRNA-sequencing depth and error in the extraction of biological information. We find that transcriptional programs can be reproducibly identified at 1% of conventional read depths. We demonstrate that this resilience to noise of "shallow'' sequencing derives from a natural property, low dimensionality, which is a fundamental feature of gene expression data. Accordingly, our conclusions hold for similar to 350 single-cell and bulk gene expression datasets across yeast, mouse, and human. In total, our approach provides quantitative guidelines for the choice of sequencing depth necessary to achieve a desired level of analytical resolution. We codify these guidelines in an open-source read depth calculator. This work demonstrates that the structure inherent in biological networks can be productively exploited to increase measurement throughput, an idea that is now common in many branches of science, such as image processing.
引用
收藏
页码:239 / 250
页数:12
相关论文
共 39 条
[11]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210
[12]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[13]   Combinatorial labeling of single cells for gene expression cytometry [J].
Fan, H. Christina ;
Fu, Glenn K. ;
Fodor, Stephen P. A. .
SCIENCE, 2015, 347 (6222)
[14]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507
[15]   Dynamic modeling of gene expression data [J].
Holter, NS ;
Maritan, A ;
Cieplak, M ;
Fedoroff, NV ;
Banavar, JR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (04) :1693-1698
[16]   Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types [J].
Jaitin, Diego Adhemar ;
Kenigsberg, Ephraim ;
Keren-Shaul, Hadas ;
Elefant, Naama ;
Paul, Franziska ;
Zaretsky, Irina ;
Mildner, Alexander ;
Cohen, Nadav ;
Jung, Steffen ;
Tanay, Amos ;
Amit, Ido .
SCIENCE, 2014, 343 (6172) :776-779
[17]   Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells [J].
Klein, Allon M. ;
Mazutis, Linas ;
Akartuna, Ilke ;
Tallapragada, Naren ;
Veres, Adrian ;
Li, Victor ;
Peshkin, Leonid ;
Weitz, David A. ;
Kirschner, Marc W. .
CELL, 2015, 161 (05) :1187-1201
[18]   Exploring the shallow end; estimating information content in transcriptomics studies [J].
Kliebenstein, Daniel J. .
FRONTIERS IN PLANT SCIENCE, 2012, 3
[19]   Deconstructing transcriptional heterogeneity in pluripotent stem cells [J].
Kumar, Roshan M. ;
Cahan, Patrick ;
Shalek, Alex K. ;
Satija, Rahul ;
DaleyKeyser, Ajay ;
Li, Hu ;
Zhang, Jin ;
Pardee, Keith ;
Gennert, David ;
Trombetta, John J. ;
Ferrante, Thomas C. ;
Regev, Aviv ;
Daley, George Q. ;
Collins, James J. .
NATURE, 2014, 516 (7529) :56-U112
[20]   Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets [J].
Macosko, Evan Z. ;
Basu, Anindita ;
Satija, Rahul ;
Nemesh, James ;
Shekhar, Karthik ;
Goldman, Melissa ;
Tirosh, Itay ;
Bialas, Allison R. ;
Kamitaki, Nolan ;
Martersteck, Emily M. ;
Trombetta, John J. ;
Weitz, David A. ;
Sanes, Joshua R. ;
Shalek, Alex K. ;
Regev, Aviv ;
McCarroll, Steven A. .
CELL, 2015, 161 (05) :1202-1214