The fractured landscape of RNA-seq alignment: the default in our STARs

被引:11
作者
Ballouz, Sara [1 ]
Dobin, Alexander [1 ]
Gingeras, Thomas R. [1 ]
Gillis, Jesse [1 ]
机构
[1] Cold Spring Harbor Lab, Stanley Inst Cognit Genom, Woodbury, NY 11797 USA
基金
美国国家卫生研究院;
关键词
DIFFERENTIAL EXPRESSION METHODS; DOSAGE COMPENSATION; QUANTIFICATION; TRANSCRIPTOME; BENCHMARKING;
D O I
10.1093/nar/gky325
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Many tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods' good performance, but are focused on either model data or fail to explain variation in performance. This leaves us to ask, what is the most meaningful way to assess different alignment choices? And importantly, where is there room for progress? In this work, we explore the answers to these two questions by performing an exhaustive assessment of the STAR aligner. We assess STAR's performance across a range of alignment parameters using common metrics, and then on biologically focused tasks. We find technical metrics such as fraction mapping or expression profile correlation to be uninformative, capturing properties unlikely to have any role in biological discovery. Surprisingly, we find that changes in alignment parameters within a wide range have little impact on both technical and biological performance. Yet, when performance finally does break, it happens in difficult regions, such as X-Y paralogs and MHC genes. We believe improved reporting by developers will help establish where results are likely to be robust or fragile, providing a better baseline to establish where methodological progress can still occur.
引用
收藏
页码:5125 / 5138
页数:14
相关论文
共 31 条
[1]   Issues in bioinformatics benchmarking: the case study of multiple sequence alignment [J].
Aniba, Mohamed Radhouene ;
Poch, Olivier ;
Thompson, Julie D. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (21) :7353-7363
[2]   AuPairWise: A Method to Estimate RNA-Seq Replicability through Co-expression [J].
Ballouz, Sara ;
Gillis, Jesse .
PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (04)
[3]   MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive [J].
Bernstein, Matthew N. ;
Doan, Anhai ;
Dewey, Colin N. .
BIOINFORMATICS, 2017, 33 (18) :2914-2923
[4]   Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies [J].
Boulesteix, Anne-Laure ;
Wilson, Rory ;
Hapfelmeier, Alexander .
BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
[5]   On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by Smith et al. [J].
Boulesteix, Anne-Laure .
BIOINFORMATICS, 2013, 29 (20) :2664-2666
[6]   Near-optimal probabilistic RNA-seq quantification [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (05) :525-527
[7]   Reproducible RNA-seq analysis using recount2 [J].
Collado-Torres, Leonardo ;
Nellore, Abhinav ;
Kammers, Kai ;
Ellis, Shannon E. ;
Taub, Margaret A. ;
Hansen, Kasper D. ;
Jaffe, Andrew E. ;
Langmead, Ben ;
Leek, Jeffrey T. .
NATURE BIOTECHNOLOGY, 2017, 35 (04) :319-321
[8]   Dosage Compensation of the Sex Chromosomes [J].
Disteche, Christine M. .
ANNUAL REVIEW OF GENETICS, VOL 46, 2012, 46 :537-560
[9]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[10]   Ballgown bridges the gap between transcriptome assembly and expression analysis [J].
Frazee, Alyssa C. ;
Pertea, Geo ;
Jaffe, Andrew E. ;
Langmead, Ben ;
Salzberg, Steven L. ;
Leek, Jeffrey T. .
NATURE BIOTECHNOLOGY, 2015, 33 (03) :243-246