Genomic Data Compression

被引:39
|
作者
Hernaez, Mikel [1 ]
Pavlichin, Dmitri [2 ]
Weissman, Tsachy [2 ]
Ochoa, Idoia [3 ]
机构
[1] Univ Illinois, Carl R Woese Inst Genom Biol, Urbana, IL 61801 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[3] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
来源
ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 2, 2019 | 2019年 / 2卷
关键词
genomic data; compression; storage; QUALITY SCORE COMPRESSION; LOSSY COMPRESSION; SEQUENCE; IDENTIFICATIONS; ALGORITHMS; PROTEIN; FORMAT; PRIDE; READS;
D O I
10.1146/annurev-biodatasci-072018-021229
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store genomic data for a decade than it is to obtain the data in the first place. This situation calls for efficient representations of genomic information. In this review, we emphasize the need for designing specialized compressors tailored to genomic data and describe the main solutions already proposed. We also give general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.
引用
收藏
页码:19 / 37
页数:19
相关论文
共 50 条
  • [1] Genomic Data Clustering on FPGAs for Compression
    Petraglio, Enrico
    Wertenbroek, Rick
    Capitao, Flavio
    Guex, Nicolas
    Iseli, Christian
    Thoma, Yann
    APPLIED RECONFIGURABLE COMPUTING, 2017, 10216 : 229 - 240
  • [2] Data structures and compression algorithms for genomic sequence data
    Brandon, Marty C.
    Wallace, Douglas C.
    Baldi, Pierre
    BIOINFORMATICS, 2009, 25 (14) : 1731 - 1738
  • [3] Fast Genomic Data Compression on Multicore Machines
    Sanz, Victoria
    Pousa, Adrian
    Naiouf, Marcelo
    De Giusti, Armando
    CLOUD COMPUTING, BIG DATA AND EMERGING TOPICS, JCC-BD&ET 2024, 2025, 2189 : 3 - 13
  • [4] Lossy compression of quality scores in genomic data
    Canovas, Rodrigo
    Moffat, Alistair
    Turpin, Andrew
    BIOINFORMATICS, 2014, 30 (15) : 2130 - 2136
  • [5] Aligned genomic data compression via improved modeling
    Ochoa, Idoia
    Hernaez, Mikel
    Weissman, Tsachy
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (06)
  • [6] An Approach to Compression of Genomic Data Based on Image File Format
    Martins, Juliano V.
    Kredens, Kelvin V.
    Dordall, Osmar B.
    Arruda, Paulo H. S.
    Borges, Andr P.
    Herai, Roberto H.
    Scalabrin, Edson E.
    Avila, Braulio C.
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 3274 - 3279
  • [7] No-Reference Compression of Genomic Data Stored In FASTQ Format
    Bhola, Vishal
    Bopardikar, Ajit S.
    Narayanan, Rangavittal
    Lee, Kyusang
    Ahn, TaeJin
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 147 - 150
  • [8] FPGA Acceleration of Reference-Based Compression for Genomic Data
    Arram, James
    Pflanzer, Moritz
    Kaplan, Thomas
    Luk, Wayne
    2015 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (FPT), 2015, : 9 - 16
  • [9] HRCM: An Efficient Hybrid Referential Compression Method for Genomic Big Data
    Yao, Haichang
    Ji, Yimu
    Li, Kui
    Liu, Shangdong
    He, Jing
    Wang, Ruchuan
    BIOMED RESEARCH INTERNATIONAL, 2019, 2019
  • [10] A bucket index correction based method for compression of genomic sequencing data
    Wang, Rongjie
    Bai, Yang
    Cheng, Qianlong
    Zang, Tianyi
    Wang, Yadong
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 634 - 637