fastp: an ultra-fast all-in-one FASTQ preprocessor

被引:14028
作者
Chen, Shifu [1 ,2 ]
Zhou, Yanqing [1 ]
Chen, Yaru [1 ]
Gu, Jia [2 ]
机构
[1] HaploX Biotechnol, Dept Bioinformat, Shenzhen 518057, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
基金
美国国家科学基金会;
关键词
READ ALIGNMENT;
D O I
10.1093/bioinformatics/bty560
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results: We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in Cthornthorn and has multi-threading support. Based on our evaluation, fastp is 2-5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools.
引用
收藏
页码:884 / 890
页数:7
相关论文
共 15 条
  • [1] Noninvasive Prenatal Testing and Incidental Detection of Occult Maternal Malignancies
    Bianchi, Diana W.
    Chudova, Darya
    Sehnert, Amy J.
    Bhatt, Sucheta
    Murray, Kathryn
    Prosen, Tracy L.
    Garber, Judy E.
    Wilkins-Haug, Louise
    Vora, Neeta L.
    Warsof, Stephen
    Goldberg, James
    Ziainia, Tina
    Halks-Miller, Meredith
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2015, 314 (02): : 162 - 169
  • [2] Trimmomatic: a flexible trimmer for Illumina sequence data
    Bolger, Anthony M.
    Lohse, Marc
    Usadel, Bjoern
    [J]. BIOINFORMATICS, 2014, 30 (15) : 2114 - 2120
  • [3] Brad Chapman R. K, 2018, VALIDATED SCALABLE C
  • [4] AfterQC: automatic filtering, trimming, error removing and quality control for fastq data
    Chen, Shifu
    Huang, Tanxiao
    Zhou, Yanqing
    Han, Yue
    Xu, Mingyan
    Gu, Jia
    [J]. BMC BIOINFORMATICS, 2017, 18
  • [5] SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data
    Chen, Yuxin
    Chen, Yongsheng
    Shi, Chunmei
    Huang, Zhibo
    Zhang, Yong
    Li, Shengkang
    Li, Yan
    Ye, Jia
    Yu, Chang
    Li, Zhuo
    Zhang, Xiuqing
    Wang, Jian
    Yang, Huanming
    Fang, Lin
    Chen, Qiang
    [J]. GIGASCIENCE, 2017, 7 (01): : 1 - 6
  • [6] Chiang C, 2015, NAT METHODS, V12, P966, DOI [10.1038/NMETH.3505, 10.1038/nmeth.3505]
  • [7] The Emerging Role of "Liquid Biopsies," Circulating Tumor Cells, and Circulating Cell-Free Tumor DNA in Lung Cancer Diagnosis and Identification of Resistance Mutations
    Esposito, Angela
    Criscitiello, Carmen
    Trapani, Dario
    Curigliano, Giuseppe
    [J]. CURRENT ONCOLOGY REPORTS, 2017, 19 (01)
  • [8] Detecting ultralow-frequency mutations by Duplex Sequencing
    Kennedy, Scott R.
    Schmitt, Michael W.
    Fox, Edward J.
    Kohrn, Brendan F.
    Salk, Jesse J.
    Ahn, Eun Hyun
    Prindle, Marc J.
    Kuong, Kawai J.
    Shen, Jiang-Cheng
    Risques, Rosa-Ana
    Loeb, Lawrence A.
    [J]. NATURE PROTOCOLS, 2014, 9 (11) : 2586 - 2606
  • [9] Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
  • [10] Li H, 2009, BIOINFORMATICS, V25, P1094, DOI [10.1093/bioinformatics/btp100, 10.1093/bioinformatics/btp324]