data compression;
Next-Generation Sequencing data;
DNA;
genomes;
GENOMIC DATA;
D O I:
10.3390/a13060151
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
The increase in memory and in network traffic used and caused by new sequenced biological data has recently deeply grown. Genomic projects such as HapMap and 1000 Genomes have contributed to the very large rise of databases and network traffic related to genomic data and to the development of new efficient technologies. The large-scale sequencing of samples of DNA has brought new attention and produced new research, and thus the interest in the scientific community for genomic data has greatly increased. In a very short time, researchers have developed hardware tools, analysis software, algorithms, private databases, and infrastructures to support the research in genomics. In this paper, we analyze different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences, and we discuss and evaluate the compression performance of generic compression algorithms by confronting them with a specific system designed by Jones et al. specifically for genomic file compression:Quip. Moreover, we present a simple but effective technique for the compression of DNA sequences in which we only consider the relevant DNA data and experimentally evaluate its performances.
机构:
Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Chiba 2778561, Japan
Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Tokyo 1350064, JapanUniv Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Chiba 2778561, Japan
Wan, Raymond
Vo Ngoc Anh
论文数: 0引用数: 0
h-index: 0
机构:
Univ Melbourne, Dept Comp Sci & Software Engn, Melbourne, Vic 3010, AustraliaUniv Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Chiba 2778561, Japan
Vo Ngoc Anh
Asai, Kiyoshi
论文数: 0引用数: 0
h-index: 0
机构:
Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Chiba 2778561, Japan
Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Tokyo 1350064, JapanUniv Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Chiba 2778561, Japan
机构:
Washington Univ, Sch Med, Dept Microbiol & Mol Genet, St Louis, MO 63108 USA
Washington Univ, Sch Med, Genome Sequencing Ctr, St Louis, MO 63108 USAWashington Univ, Sch Med, Dept Microbiol & Mol Genet, St Louis, MO 63108 USA