Accuracy of RNA-Seq and its dependence on sequencing depth

被引:24
作者
Cai, Guoshuai [1 ]
Li, Hua [2 ]
Lu, Yue [3 ]
Huang, Xuelin [4 ]
Lee, Juhee [4 ]
Mueller, Peter [5 ]
Ji, Yuan [4 ]
Liang, Shoudan [1 ]
机构
[1] Univ Texas MD Anderson Canc Ctr, Dept Bioinformat & Computat Biol, Houston, TX 77030 USA
[2] Univ Texas MD Anderson Canc Ctr, Dept Stem Cell Transplantat & Cellular Therapy, Houston, TX 77030 USA
[3] Univ Texas MD Anderson Canc Ctr, Dept Leukemia, Houston, TX 77030 USA
[4] Univ Texas MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77030 USA
[5] Univ Texas Austin, Dept Math, Austin, TX 78712 USA
关键词
DIFFERENTIAL EXPRESSION;
D O I
10.1186/1471-2105-13-S13-S5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The cost of DNA sequencing has undergone a dramatical reduction in the past decade. As a result, sequencing technologies have been increasingly applied to genomic research. RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. As it is not clear how increased sequencing capacity has affected measurement accuracy of mRNA, we sought to investigate that relationship. Result: We empirically evaluate the accuracy of repeated gene expression measurements using RNA-Seq. We identify library preparation steps prior to DNA sequencing as the main source of error in this process. Studying three datasets, we show that the accuracy indeed improves with the sequencing depth. However, the rate of improvement as a function of sequence reads is generally slower than predicted by the binomial distribution. We therefore used the beta-binomial distribution to model the overdispersion. The overdispersion parameters we introduced depend explicitly on the number of reads so that the resulting statistical uncertainty is consistent with the empirical data that measurement accuracy increases with the sequencing depth. The overdispersion parameters were determined by maximizing the likelihood. We shown that our modified beta-binomial model had lower false discovery rate than the binomial or the pure beta-binomial models. Conclusion: We proposed a novel form of overdispersion guaranteeing that the accuracy improves with sequencing depth. We demonstrated that the new form provides a better fit to the data.
引用
收藏
页数:13
相关论文
共 26 条
[1]   Next-generation DNA sequencing techniques [J].
Ansorge, Wilhelm J. .
NEW BIOTECHNOLOGY, 2009, 25 (04) :195-203
[2]   Differential expression in SAGE: accounting for normal between-library variation [J].
Baggerly, KA ;
Deng, L ;
Morris, JS ;
Aldaz, CM .
BIOINFORMATICS, 2003, 19 (12) :1477-1483
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[5]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[6]   Evaluation of DNA microarray results with quantitative gene expression platforms [J].
Canales, Roger D. ;
Luo, Yuling ;
Willey, James C. ;
Austermiller, Bradley ;
Barbacioru, Catalin C. ;
Boysen, Cecilie ;
Hunkapiller, Kathryn ;
Jensen, Roderick V. ;
Knight, Charles R. ;
Lee, Kathleen Y. ;
Ma, Yunqing ;
Maqsodi, Botoul ;
Papallo, Adam ;
Peters, Elizabeth Herness ;
Poulter, Karen ;
Ruppel, Patricia L. ;
Samaha, Raymond R. ;
Shi, Leming ;
Yang, Wen ;
Zhang, Lu ;
Goodsaid, Federico M. .
NATURE BIOTECHNOLOGY, 2006, 24 (09) :1115-1122
[7]   Deletion of TDP-43 down-regulates Tbc1d1, a gene linked to obesity, and alters body fat metabolism [J].
Chiang, Po-Min ;
Ling, Jonathan ;
Jeong, Yun Ha ;
Price, Donald L. ;
Aja, Susan M. ;
Wong, Philip C. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (37) :16320-16324
[8]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[9]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874
[10]   RNA interference [J].
Hannon, GJ .
NATURE, 2002, 418 (6894) :244-251