Diminishing returns in next-generation sequencing (NGS) transcriptome data

被引:22
作者
Lei, Rex [1 ,2 ]
Ye, Kaixiong [1 ]
Gu, Zhenglong [1 ]
Sun, Xuepeng [1 ,3 ]
机构
[1] Cornell Univ, Div Nutr Sci, Ithaca, NY 14853 USA
[2] Ithaca High Sch, Ithaca, NY 14853 USA
[3] Zhejiang Univ, Coll Agr & Biotechnol, Hangzhou 310058, Zhejiang, Peoples R China
基金
美国国家科学基金会;
关键词
RNA-seq efficiency; RNA-SEQ; GENOME; MICROARRAY; EXPRESSION; REPRODUCIBILITY; TECHNOLOGY; LANDSCAPE; BIOLOGY; DEPTH;
D O I
10.1016/j.gene.2014.12.013
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
RNA-seq is increasingly used to study gene expression of various organisms. While it provides a great opportunity to explore genome-scale transcriptional patterns with tremendous depth, it comes with prohibitive costs. Establishing a minimal sequencing depth for required accuracy will guide cost-effective experimental design and promote the routine application of RNA-seq. To address this issue, we selected 36 RNA-seq datasets, each with more than 20 million reads from six widely-used model organisms: Saccharomyces cerevisiae, Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Arabidopsis thaliana, and investigated statistical correlations between the sequencing depth and the outcome accuracy. To achieve this, we randomly chose reads from each dataset, mapped them to the reference genomes, and analyzed the accuracy achieved with varying coverage. Our results indicated that as low as one million reads can provide the same sequencing accuracy in transcript abundance (r = 0.99) as >30 million reads for highly-expressed genes in all six species. Because many metabolically and pathologically-relevant genes are highly expressed, our findings might be instructive for cost-effective experimental designs in NGS-based research and also provide useful guidance to similar research for other organisms. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:82 / 87
页数:6
相关论文
共 23 条
[1]   Bacterial DNA microarrays for clinical microbiology: the early logarithmic phase [J].
Cassone, Marco ;
Giordano, Antonio ;
Pozzi, Gianni .
FRONTIERS IN BIOSCIENCE, 2007, 12 :2658-2669
[2]   SGD:: Saccharomyces Genome Database [J].
Cherry, JM ;
Adler, C ;
Ball, C ;
Chervitz, SA ;
Dwight, SS ;
Hester, ET ;
Jia, YK ;
Juvik, G ;
Roe, T ;
Schroeder, M ;
Weng, SA ;
Botstein, D .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :73-79
[3]  
DeRisi J, 1996, NAT GENET, V14, P457
[4]   Reliability and reproducibility issues in DNA microarray measurements [J].
Draghici, S ;
Khatri, P ;
Eklund, AC ;
Szallasi, Z .
TRENDS IN GENETICS, 2006, 22 (02) :101-109
[5]   FlyBase: genes and gene models [J].
Drysdale, RA ;
Crosby, MA .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D390-D395
[6]   How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? [J].
Haas, Brian J. ;
Chin, Melissa ;
Nusbaum, Chad ;
Birren, Bruce W. ;
Livny, Jonathan .
BMC GENOMICS, 2012, 13
[7]   DNA microarray technology: Devices, systems, and applications [J].
Heller, MJ .
ANNUAL REVIEW OF BIOMEDICAL ENGINEERING, 2002, 4 :129-153
[8]   Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq [J].
Islam, Saiful ;
Kjallquist, Una ;
Moliner, Annalena ;
Zajac, Pawel ;
Fan, Jian-Bing ;
Lonnerberg, Peter ;
Linnarsson, Sten .
GENOME RESEARCH, 2011, 21 (07) :1160-1167
[9]   The human genome browser at UCSC [J].
Kent, WJ ;
Sugnet, CW ;
Furey, TS ;
Roskin, KM ;
Pringle, TH ;
Zahler, AM ;
Haussler, D .
GENOME RESEARCH, 2002, 12 (06) :996-1006
[10]  
Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]