Predictive Coding of Aligned Next-Generation Sequencing Data

被引:5
|
作者
Voges, Jan [1 ]
Munderloh, Marco [1 ]
Ostermann, Joern [1 ]
机构
[1] Leibniz Univ Hannover, TNT, Inst Informat Verarbeitung, Appelstr 9A, D-30167 Hannover, Germany
关键词
READ ALIGNMENT; COMPRESSION; FORMAT;
D O I
10.1109/DCC.2016.98
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-the-art, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes, fitting in today's level 1 CPU caches.
引用
收藏
页码:241 / 250
页数:10
相关论文
共 50 条
  • [1] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    BIOINFORMATICS, 2023, 39 (01)
  • [2] Next-Generation Sequencing Data Analysis
    Chowdhry, Amit K.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2024,
  • [3] Indexing Next-Generation Sequencing data
    Jalili, Vahid
    Matteucci, Matteo
    Masseroli, Marco
    Ceri, Stefano
    INFORMATION SCIENCES, 2017, 384 : 90 - 109
  • [4] Next-generation sequencing of the next generation
    Darren J. Burgess
    Nature Reviews Genetics, 2011, 12 : 78 - 79
  • [5] Pathway analysis with next-generation sequencing data
    Jinying Zhao
    Yun Zhu
    Eric Boerwinkle
    Momiao Xiong
    European Journal of Human Genetics, 2015, 23 : 507 - 515
  • [6] Identification of indels in next-generation sequencing data
    Ratan, Aakrosh
    Olson, Thomas L.
    Loughran, Thomas P., Jr.
    Miller, Webb
    BMC BIOINFORMATICS, 2015, 16
  • [7] Visualizing next-generation sequencing data with JBrowse
    Westesson, Oscar
    Skinner, Mitchell
    Holmes, Ian
    BRIEFINGS IN BIOINFORMATICS, 2013, 14 (02) : 172 - 177
  • [8] Focus on next-generation sequencing data analysis
    Rusk N.
    Nature Methods, 2009, 6 (Suppl 11) : S1 - S1
  • [9] Next-generation sequencing: adjusting to data overload
    Monya Baker
    Nature Methods, 2010, 7 : 495 - 499
  • [10] Next-generation sequencing: adjusting to data overload
    Baker, Monya
    NATURE METHODS, 2010, 7 (07) : 495 - 499