Enabling the Efficient, Dependable Cloud-based Storage of Human Genomes

被引:1
作者
Cogo, Vinicius Vielmo [1 ]
Bessani, Alysson [1 ]
机构
[1] Univ Lisbon, Fac Ciencias, LASIGE, Lisbon, Portugal
来源
2019 38TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS WORKSHOPS (SRDSW 2019) | 2019年
基金
欧盟地平线“2020”;
关键词
Data Storage; Dependability; Cloud; Genomes; PRIVACY;
D O I
10.1109/SRDSW49218.2019.00011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Efficiently storing large data sets of human genomes is a long-term ambition from both the research and clinical life sciences communities. For instance, biobanks stock thousands to millions of biological physical samples and have been under pressure to store also their resulting digitized genomes. However, these and other life sciences institutions lack the infrastructure and expertise to efficiently store this data. Cloud computing is a natural economic alternative to private infrastructures, but it is not as good an alternative in terms of security and privacy. In this work, we present an end-to-end composite pipeline intended to enable the efficient, dependable cloud-based storage of human genomes by integrating three mechanisms we have recently proposed. These mechanisms encompass (1) a privacy-sensitivity detector for human genomes, (2) a similarity-based deduplication and delta-encoding algorithm for sequencing data, and (3) an auditability scheme to verify who has effectively read data in storage systems that use secure information dispersal. By integrating them with appropriate storage configurations, one can obtain reasonable privacy protection, security, and dependability guarantees at modest costs (e.g., less than $1/Genome/Year). Our preliminary analysis indicates that this pipeline costs only 3% more than non-replicated systems, 48% less than fully-replicating all data, and 31% less than secure information dispersal schemes.
引用
收藏
页码:19 / 24
页数:6
相关论文
共 43 条
[1]  
[Anonymous], 2010, Proc. of the 1st ACM Symposium on Cloud Computing. SoCC'10
[2]   Whole Genome Sequencing: Revolutionary Medicine or Privacy Nightmare? [J].
Ayday, Erman ;
De Cristofaro, Emiliano ;
Hubaux, Jean-Pierre ;
Tsudik, Gene .
COMPUTER, 2015, 48 (02) :58-66
[3]   Privacy-Preserving Processing of Raw Genomic Data [J].
Ayday, Erman ;
Raisaro, Jean Louis ;
Hengartner, Urs ;
Molyneaux, Adam ;
Hubaux, Jean-Pierre .
DATA PRIVACY MANAGEMENT AND AUTONOMOUS SPONTANEOUS SECURITY, DPM 2013, 2014, 8247 :133-147
[4]  
Bessani A, 2015, P DMAH 2015
[5]   DEPSKY: Dependable and Secure Storage in a Cloud-of-Clouds [J].
Bessani, Alysson ;
Correia, Miguel ;
Quaresma, Bruno ;
Andre, Fernando ;
Sousa, Paulo .
ACM TRANSACTIONS ON STORAGE, 2013, 9 (04)
[6]   Why software fails [J].
Charette, RN .
IEEE SPECTRUM, 2005, 42 (09) :42-49
[7]  
Clarke L, 2012, NAT METHODS, V9, P1, DOI [10.1038/NMETH.1974, 10.1038/nmeth.1974]
[8]   The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[9]  
Cogo V. V., 2017, P 11 INT C PRACT APP
[10]  
Cogo V.V., 2016, Communications and Innovations Gazette (ComInG), V1, P1