Compression of Nanopore FASTQ Files

被引:2
|
作者
Dufort y Alvarez, Guillermo [1 ]
Seroussi, Gadiel [1 ,2 ]
Smircich, Pablo [3 ,4 ]
Sotelo, Jose [3 ,4 ]
Ochoa, Idoia [5 ]
Martin, Alvaro [1 ]
机构
[1] Univ Republica, Fac Ingn, Montevideo, Uruguay
[2] Xperi Corp, San Jose, CA USA
[3] Univ Republica, Fac Ciencias, Montevideo, Uruguay
[4] Inst Invest Biol Clemente Estable, Dept Genom, Montevideo, Uruguay
[5] Univ Illinois, Elect & Comp Engn, Urbana, IL 61801 USA
来源
BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2019, PT I | 2019年 / 11465卷
关键词
Genomic data compression; FASTQ compression; Nanopore sequencing technology;
D O I
10.1007/978-3-030-17938-0_4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The research and development of tools for genomic data compression has focused so far on data generated by second-generation sequencing technologies, while third-generation technologies, such as nanopore technologies, have received little attention in the data compression research community. In this paper, we investigate compression schemes for nanopore FASTQ files. We propose a nanopore quality scores compressor, called DualCtx, which yields significant improvements in compression performance with respect to the state-of-the-art. We also extend DualCtx to a full FASTQ compressor, termed DualFqz, by substituting DualCtx for the quality score compression module in a variant of Fqzcomp. We tested DualFqz and various existing compressors on a large nanopore data set. The results show that DualFqz achieves the best compression performance. The experiments also show that most current implementations of compressors fail to execute correctly on files with long variable length reads. DualCtx and DualFqz are freely available for download at: https:// github.com/guidufort/DualFqz.
引用
收藏
页码:36 / 47
页数:12
相关论文
共 50 条
  • [1] ENANO: Encoder for NANOpore FASTQ files
    Alvarez, Guillermo Dufort Y.
    Seroussi, Gadiel
    Smircich, Pablo
    Sotelo, Jose
    Ochoa, Idoia
    Martin, Alvaro
    BIOINFORMATICS, 2020, 36 (16) : 4506 - 4507
  • [2] Efficient algorithms for the compression of FASTQ files
    Saha, Subrata
    Rajasekaran, Sanguthevar
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [3] RENANO: a REference-based compressor for NANOpore FASTQ files
    Dufort y Alvarez, Guillermo
    Seroussi, Gadiel
    Smircich, Pablo
    Sotelo-Silveira, Jose
    Ochoa, Idoia
    Martin, Alvaro
    BIOINFORMATICS, 2021, 37 (24) : 4862 - 4864
  • [4] DSRC 2-Industry-oriented compression of FASTQ files
    Roguski, Lukasz
    Deorowicz, Sebastian
    BIOINFORMATICS, 2014, 30 (15) : 2213 - 2215
  • [5] A new efficient referential genome compression technique for FastQ files
    Kumar, Sanjeev
    Singh, Mukund Pratap
    Nayak, Soumya Ranjan
    Khan, Asif Uddin
    Jain, Anuj Kumar
    Singh, Prabhishek
    Diwakar, Manoj
    Soujanya, Thota
    FUNCTIONAL & INTEGRATIVE GENOMICS, 2023, 23 (04)
  • [6] A new efficient referential genome compression technique for FastQ files
    Sanjeev Kumar
    Mukund Pratap Singh
    Soumya Ranjan Nayak
    Asif Uddin Khan
    Anuj Kumar Jain
    Prabhishek Singh
    Manoj Diwakar
    Thota Soujanya
    Functional & Integrative Genomics, 2023, 23
  • [7] GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
    Xing, Yuting
    Li, Gen
    Wang, Zhenguo
    Feng, Bolun
    Song, Zhuo
    Wu, Chengkun
    BMC BIOINFORMATICS, 2017, 18
  • [8] Restoring flowcell type and basecaller configuration from FASTQ files of nanopore sequencing data
    Jun Mencius
    Wenjun Chen
    Youqi Zheng
    Tingyi An
    Yongguo Yu
    Kun Sun
    Huijuan Feng
    Zhixing Feng
    Nature Communications, 16 (1)
  • [9] RETRACTED: LFQC: a lossless compression algorithm for FASTQ files (Retracted Article)
    Nicolae, Marius
    Pathak, Sudipta
    Rajasekaran, Sanguthevar
    BIOINFORMATICS, 2015, 31 (20) : 3276 - 3281
  • [10] GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
    Yuting Xing
    Gen Li
    Zhenguo Wang
    Bolun Feng
    Zhuo Song
    Chengkun Wu
    BMC Bioinformatics, 18