seqQscorer: automated quality control of next-generation sequencing data using machine learning

被引:0
作者
Steffen Albrecht
Maximilian Sprang
Miguel A. Andrade-Navarro
Jean-Fred Fontaine
机构
[1] Johannes Gutenberg-Universität Mainz,
来源
Genome Biology | / 22卷
关键词
Next-generation sequencing data; Quality control; Machine learning; Classification; Bioinformatics;
D O I
暂无
中图分类号
学科分类号
摘要
Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer.
引用
收藏
相关论文
共 165 条
[1]  
Merino GA(2016)The impact of quality control in RNA-seq experiments J Phys Conf Ser. 705 012003-721
[2]  
Fresno C(2016)Trimming of sequence reads alters RNA-Seq gene expression estimates BMC Bioinformatics 17 103-1760
[3]  
Netto F(2019)To trim or not to trim: effects of read trimming on the de novo genome assembly of a widespread east Asian passerine, the Rufous-capped babbler (Cyanoderma ruficeps Blyth) Genes 10 737-21
[4]  
Netto ED(2014)Identifying and mitigating bias in next-generation sequencing methods for chromatin biology Nat Rev Genet 15 709-1111
[5]  
Pratto L(2012)Fast gapped-read alignment with Bowtie 2 Nat Methods 9 357-223
[6]  
Fernandez EA(2009)Fast and accurate short read alignment with Burrows-Wheeler transform Bioinformatics 25 1754-3139
[7]  
Williams CR(2012)STAR: ultrafast universal RNA-seq aligner Bioinformatics 29 15-3048
[8]  
Baccarella A(2009)TopHat: discovering splice junctions with RNA-Seq Bioinformatics 25 1105-423
[9]  
Parrish JZ(2014)Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol 15 550-1118
[10]  
Kim CC(2014)Large-scale quality analysis of published ChIP-seq data G3 4 209-640