Reference-free compression of next-generation sequencing data in FASTQ format

被引:0
|
作者
Tan, Li [1 ]
Sun, Jifeng [2 ]
机构
[1] Guangzhou Maritime Inst, Sch Informat & Commun Engn, Guangzhou, Guangdong, Peoples R China
[2] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
来源
2017 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB) | 2017年
关键词
NGS; DEMT model; DSRC; Lossless compression; LOSS-LESS COMPRESSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a new reference-free and lossless approach to compress next-generation sequencing (NGS) data in FASTQ format, splitting the input FASTQ data into three parts of metadata, short reads and quality scores, and eliminating their redundancy independently according to their own characteristics. Experiments were conducted on five real-world NGS data. The results show that the proposed algorithm has better compression gain as compared to the previous state of the art compression algorithms.
引用
收藏
页码:10 / 13
页数:4
相关论文
共 50 条
  • [1] FQZip: Lossless Reference-Based Compression of Next Generation Sequencing Data in FASTQ Format
    Zhang, Yongpeng
    Li, Linsen
    Xiao, Jun
    Yang, Yanli
    Zhu, Zexuan
    PROCEEDINGS OF THE 18TH ASIA PACIFIC SYMPOSIUM ON INTELLIGENT AND EVOLUTIONARY SYSTEMS, VOL 2, 2015, : 127 - 135
  • [2] Transformations for the compression of FASTQ quality scores of next-generation sequencing data
    Wan, Raymond
    Vo Ngoc Anh
    Asai, Kiyoshi
    BIOINFORMATICS, 2012, 28 (05) : 628 - 635
  • [3] Compression of FASTQ and SAM Format Sequencing Data
    Bonfield, James K.
    Mahoney, Matthew V.
    PLOS ONE, 2013, 8 (03):
  • [4] Reference-free transcriptome assembly in non-model animals from next-generation sequencing data
    Cahais, V.
    Gayral, P.
    Tsagkogeorga, G.
    Melo-Ferreira, J.
    Ballenghien, M.
    Weinert, L.
    Chiari, Y.
    Belkhir, K.
    Ranwez, V.
    Galtier, N.
    MOLECULAR ECOLOGY RESOURCES, 2012, 12 (05) : 834 - 845
  • [5] Lossless and reference-free compression of FASTQ/A files using GeneSqueeze
    Nazari, Foad
    Patel, Sneh
    Larocca, Melissa
    Sansevich, Alina
    Czarny, Ryan
    Schena, Giana
    Murray, Emma K.
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [6] Reference-Free Imputation of Targeted Next-Generation Sequence Datasets
    Nampally, Arun
    Kim, Joseph
    Proffitt, Eric
    Palovcak, Eugene
    Lacoste, Alix
    14TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, BCB 2023, 2023,
  • [7] SPRING: a next-generation compressor for FASTQ data
    Chandak, Shubham
    Tatwawadi, Kedar
    Ochoa, Idoia
    Hernaez, Mikel
    Weissman, Tsachy
    BIOINFORMATICS, 2019, 35 (15) : 2674 - 2676
  • [8] No-Reference Compression of Genomic Data Stored In FASTQ Format
    Bhola, Vishal
    Bopardikar, Ajit S.
    Narayanan, Rangavittal
    Lee, Kyusang
    Ahn, TaeJin
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 147 - 150
  • [9] A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits
    Heller, Rasmus
    Nursyifa, Casia
    Garcia-Erill, Genis
    Salmona, Jordi
    Chikhi, Lounes
    Meisner, Jonas
    Korneliussen, Thorfinn Sand
    Albrechtsen, Anders
    MOLECULAR ECOLOGY RESOURCES, 2021, 21 (04) : 1085 - 1097
  • [10] Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
    Frampton, Matthew
    Houlston, Richard
    PLOS ONE, 2012, 7 (11):