fastp: an ultra-fast all-in-one FASTQ preprocessor

被引:16212
作者
Chen, Shifu [1 ,2 ]
Zhou, Yanqing [1 ]
Chen, Yaru [1 ]
Gu, Jia [2 ]
机构
[1] HaploX Biotechnol, Dept Bioinformat, Shenzhen 518057, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
基金
美国国家科学基金会;
关键词
READ ALIGNMENT;
D O I
10.1093/bioinformatics/bty560
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results: We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in Cthornthorn and has multi-threading support. Based on our evaluation, fastp is 2-5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools.
引用
收藏
页码:884 / 890
页数:7
相关论文
共 15 条
[1]   Noninvasive Prenatal Testing and Incidental Detection of Occult Maternal Malignancies [J].
Bianchi, Diana W. ;
Chudova, Darya ;
Sehnert, Amy J. ;
Bhatt, Sucheta ;
Murray, Kathryn ;
Prosen, Tracy L. ;
Garber, Judy E. ;
Wilkins-Haug, Louise ;
Vora, Neeta L. ;
Warsof, Stephen ;
Goldberg, James ;
Ziainia, Tina ;
Halks-Miller, Meredith .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2015, 314 (02) :162-169
[2]   Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120
[3]  
Brad Chapman R. K, 2018, VALIDATED SCALABLE C
[4]   AfterQC: automatic filtering, trimming, error removing and quality control for fastq data [J].
Chen, Shifu ;
Huang, Tanxiao ;
Zhou, Yanqing ;
Han, Yue ;
Xu, Mingyan ;
Gu, Jia .
BMC BIOINFORMATICS, 2017, 18
[5]   SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data [J].
Chen, Yuxin ;
Chen, Yongsheng ;
Shi, Chunmei ;
Huang, Zhibo ;
Zhang, Yong ;
Li, Shengkang ;
Li, Yan ;
Ye, Jia ;
Yu, Chang ;
Li, Zhuo ;
Zhang, Xiuqing ;
Wang, Jian ;
Yang, Huanming ;
Fang, Lin ;
Chen, Qiang .
GIGASCIENCE, 2017, 7 (01) :1-6
[6]  
Chiang C, 2015, NAT METHODS, V12, P966, DOI [10.1038/NMETH.3505, 10.1038/nmeth.3505]
[7]   The Emerging Role of "Liquid Biopsies," Circulating Tumor Cells, and Circulating Cell-Free Tumor DNA in Lung Cancer Diagnosis and Identification of Resistance Mutations [J].
Esposito, Angela ;
Criscitiello, Carmen ;
Trapani, Dario ;
Curigliano, Giuseppe .
CURRENT ONCOLOGY REPORTS, 2017, 19 (01)
[8]   Detecting ultralow-frequency mutations by Duplex Sequencing [J].
Kennedy, Scott R. ;
Schmitt, Michael W. ;
Fox, Edward J. ;
Kohrn, Brendan F. ;
Salk, Jesse J. ;
Ahn, Eun Hyun ;
Prindle, Marc J. ;
Kuong, Kawai J. ;
Shen, Jiang-Cheng ;
Risques, Rosa-Ana ;
Loeb, Lawrence A. .
NATURE PROTOCOLS, 2014, 9 (11) :2586-2606
[9]  
Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
[10]  
Li H, 2009, BIOINFORMATICS, V25, P1094, DOI [10.1093/bioinformatics/btp100, 10.1093/bioinformatics/btp324]