Compression of Next-Generation Sequencing Data and of DNA Digital Files

被引:3
|
作者
Carpentieri, Bruno [1 ]
机构
[1] Univ Salerno, Dipartimento Informat, Via Giovanni Paolo II 132, I-84084 Fisciano, SA, Italy
关键词
data compression; Next-Generation Sequencing data; DNA; genomes; GENOMIC DATA;
D O I
10.3390/a13060151
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increase in memory and in network traffic used and caused by new sequenced biological data has recently deeply grown. Genomic projects such as HapMap and 1000 Genomes have contributed to the very large rise of databases and network traffic related to genomic data and to the development of new efficient technologies. The large-scale sequencing of samples of DNA has brought new attention and produced new research, and thus the interest in the scientific community for genomic data has greatly increased. In a very short time, researchers have developed hardware tools, analysis software, algorithms, private databases, and infrastructures to support the research in genomics. In this paper, we analyze different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences, and we discuss and evaluate the compression performance of generic compression algorithms by confronting them with a specific system designed by Jones et al. specifically for genomic file compression:Quip. Moreover, we present a simple but effective technique for the compression of DNA sequences in which we only consider the relevant DNA data and experimentally evaluate its performances.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Pathway analysis with next-generation sequencing data
    Zhao, Jinying
    Zhu, Yun
    Boerwinkle, Eric
    Xiong, Momiao
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2015, 23 (04) : 507 - 515
  • [32] Genotyping microsatellites in next-generation sequencing data
    Dashnow, Harriet
    Tan, Susan
    Das, Debjani
    Easteal, Simon
    Oshlack, Alicia
    BMC BIOINFORMATICS, 2015, 16
  • [33] Genotyping microsatellites in next-generation sequencing data
    Harriet Dashnow
    Susan Tan
    Debjani Das
    Simon Easteal
    Alicia Oshlack
    BMC Bioinformatics, 16
  • [34] Digital Fetal Aneuploidy Diagnosis by Next-Generation Sequencing
    Voelkerding, Karl V.
    Lyon, Elaine
    CLINICAL CHEMISTRY, 2010, 56 (03) : 336 - 338
  • [35] Next-Generation Digital Information Storage in DNA
    Church, George M.
    Gao, Yuan
    Kosuri, Sriram
    SCIENCE, 2012, 337 (6102) : 1628 - 1628
  • [36] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    DePristo, Mark A.
    Banks, Eric
    Poplin, Ryan
    Garimella, Kiran V.
    Maguire, Jared R.
    Hartl, Christopher
    Philippakis, Anthony A.
    del Angel, Guillermo
    Rivas, Manuel A.
    Hanna, Matt
    McKenna, Aaron
    Fennell, Tim J.
    Kernytsky, Andrew M.
    Sivachenko, Andrey Y.
    Cibulskis, Kristian
    Gabriel, Stacey B.
    Altshuler, David
    Daly, Mark J.
    NATURE GENETICS, 2011, 43 (05) : 491 - +
  • [37] AQME: A forensic mitochondrial DNA analysis tool for next-generation sequencing data
    Sturk-Andreaggi, Kimberly
    Peck, Michelle A.
    Boysen, Cecilie
    Dekker, Patrick
    McMahon, Timothy P.
    Marshall, Charla K.
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2017, 31 : 189 - 197
  • [38] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    Mark A DePristo
    Eric Banks
    Ryan Poplin
    Kiran V Garimella
    Jared R Maguire
    Christopher Hartl
    Anthony A Philippakis
    Guillermo del Angel
    Manuel A Rivas
    Matt Hanna
    Aaron McKenna
    Tim J Fennell
    Andrew M Kernytsky
    Andrey Y Sivachenko
    Kristian Cibulskis
    Stacey B Gabriel
    David Altshuler
    Mark J Daly
    Nature Genetics, 2011, 43 : 491 - 498
  • [39] Perspectives of DNA microarray and next-generation DNA sequencing technologies
    Teng XiaoKun
    Xiao HuaSheng
    SCIENCE IN CHINA SERIES C-LIFE SCIENCES, 2009, 52 (01): : 7 - 16
  • [40] A Highly Parallel Next-Generation DNA Sequencing Data Analysis Pipeline in Hadoop
    Aggour, Kareem S.
    Kumar, Vijay S.
    Sangurdekar, Dipen P.
    Newberg, Lee A.
    Kodira, Chinnappa D.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 756 - 763