AdapterRemoval: Easy cleaning of next-generation sequencing reads

被引:448
作者
Lindgreen S. [1 ,2 ,3 ]
机构
[1] Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen K
[2] Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen
[3] School of Biological Sciences, University of Canterbury, Christchurch 8041
关键词
Adapter trimming; Data pre-processing; Next-generation sequencing; Paired-end reads; Sequence alignment; Single-end reads;
D O I
10.1186/1756-0500-5-337
中图分类号
学科分类号
摘要
Background: With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. Findings. We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5' and 3' ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. Conclusions: AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data. © 2012 Lindgreen.
引用
收藏
相关论文
共 35 条
[1]  
Niedringhaus T.P., Milanova D., Kerby M.B., Snyder M.P., Barron A.E., Landscape of next-generation sequencing technologies, Anal Chem, 83, 12, pp. 4327-4341, (2011)
[2]  
Langmead B., Trapnell C., Pop M., Salzberg S.L., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biol, 10, 3, (2009)
[3]  
Li H., Durbin R., Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25, pp. 1754-1760, (2009)
[4]  
Li R., Li Y., Kristiansen K., Wang J., SOAP: Short oligonucleotide alignment program, Bioinformatics, 24, 5, pp. 713-714, (2008)
[5]  
Li R., Yu C., Li Y., Lam T.W., Yiu S.M., Kristiansen K., Wang J., SOAP2: An improved ultrafast tool for short read alignment, Bioinformatics, 25, 15, pp. 1966-1967, (2009)
[6]  
Lindgreen S., AdapterRemoval, (2012)
[7]  
Kong Y., Btrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies, Genomics, 98, 2, pp. 152-153, (2011)
[8]  
Kong Y., Btrim, (2011)
[9]  
Pandey R.V., Nolte V., Schlotterer C., CANGS: A user-friendly utility for processing and analyzing 454 GS-FLX data in biodiversity studies, BMC Res Notes, 3, (2010)
[10]  
Pandey R.V., Nolte V., Schlotterer C., CANGS, (2010)