The impact of read length on quantification of differentially expressed genes and splice junction detection

被引:85
作者
Chhangawala, Sagar [1 ,2 ]
Rudy, Gabe [3 ]
Mason, Christopher E. [1 ,4 ]
Rosenfeld, Jeffrey A. [4 ,5 ]
机构
[1] Weill Cornell Med Coll, Inst Computat Biomed, New York, NY 10021 USA
[2] Weill Cornell Med Coll, Dept Physiol & Biophys, New York, NY 10021 USA
[3] Golden Helix, Bozeman, MT 59718 USA
[4] Rutgers Canc Inst New Jersey, New Brunswick, NJ 08901 USA
[5] Amer Museum Nat Hist, New York, NY 10024 USA
来源
GENOME BIOLOGY | 2015年 / 16卷
关键词
RNA-SEQ; GENERATION; PACKAGE;
D O I
10.1186/s13059-015-0697-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. Results: We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application. Conclusions: A researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study.
引用
收藏
页数:10
相关论文
共 13 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[3]   VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R [J].
Chen, Hanbo ;
Boutros, Paul C. .
BMC BIOINFORMATICS, 2011, 12
[4]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[5]   EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments [J].
Leng, Ning ;
Dawson, John A. ;
Thomson, James A. ;
Ruotti, Victor ;
Rissman, Anna I. ;
Smits, Bart M. G. ;
Haag, Jill D. ;
Gould, Michael N. ;
Stewart, Ron M. ;
Kendziorski, Christina .
BIOINFORMATICS, 2013, 29 (08) :1035-1043
[6]   RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome [J].
Li, Bo ;
Dewey, Colin N. .
BMC BIOINFORMATICS, 2011, 12
[7]   Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study [J].
Li, Sheng ;
Tighe, Scull W. ;
Nicolet, Charles M. ;
Grove, Deborah ;
Levy, Shawn ;
Farmerie, William ;
Viale, Agnes ;
Wright, Chris ;
Schweitzer, Peter A. ;
Gao, Yuan ;
Kim, Dewey ;
Boland, Joe ;
Hicks, Belynda ;
Kim, Ryan ;
Chhangawala, Sagar ;
Jafari, Nadereh ;
Raghavachari, Nalini ;
Gandara, Jorge ;
Garcia-Reyero, Natalia ;
Hendrickson, Cynthia ;
Roberson, David ;
Rosenfeldr, Jeffrey ;
Smith, Todd ;
Underwood, Jason G. ;
Wang, May ;
Zumbo, Paul ;
Baldwin, Don A. ;
Grills, George S. ;
Mason, Christopher E. .
NATURE BIOTECHNOLOGY, 2014, 32 (09) :915-925
[8]   BEDTools: a flexible suite of utilities for comparing genomic features [J].
Quinlan, Aaron R. ;
Hall, Ira M. .
BIOINFORMATICS, 2010, 26 (06) :841-842
[9]   edgeR: a Bioconductor package for differential expression analysis of digital gene expression data [J].
Robinson, Mark D. ;
McCarthy, Davis J. ;
Smyth, Gordon K. .
BIOINFORMATICS, 2010, 26 (01) :139-140
[10]   Investigating repetitively matching short sequencing reads The enigmatic nature of H3K9me3 [J].
Rosenfeld, Jeffrey A. ;
Xuan, Zhenyu ;
DeSalle, Rob .
EPIGENETICS, 2009, 4 (07) :476-486