Enabling the Efficient, Dependable Cloud-based Storage of Human Genomes

被引:1
作者
Cogo, Vinicius Vielmo [1 ]
Bessani, Alysson [1 ]
机构
[1] Univ Lisbon, Fac Ciencias, LASIGE, Lisbon, Portugal
来源
2019 38TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS WORKSHOPS (SRDSW 2019) | 2019年
基金
欧盟地平线“2020”;
关键词
Data Storage; Dependability; Cloud; Genomes; PRIVACY;
D O I
10.1109/SRDSW49218.2019.00011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Efficiently storing large data sets of human genomes is a long-term ambition from both the research and clinical life sciences communities. For instance, biobanks stock thousands to millions of biological physical samples and have been under pressure to store also their resulting digitized genomes. However, these and other life sciences institutions lack the infrastructure and expertise to efficiently store this data. Cloud computing is a natural economic alternative to private infrastructures, but it is not as good an alternative in terms of security and privacy. In this work, we present an end-to-end composite pipeline intended to enable the efficient, dependable cloud-based storage of human genomes by integrating three mechanisms we have recently proposed. These mechanisms encompass (1) a privacy-sensitivity detector for human genomes, (2) a similarity-based deduplication and delta-encoding algorithm for sequencing data, and (3) an auditability scheme to verify who has effectively read data in storage systems that use secure information dispersal. By integrating them with appropriate storage configurations, one can obtain reasonable privacy protection, security, and dependability guarantees at modest costs (e.g., less than $1/Genome/Year). Our preliminary analysis indicates that this pipeline costs only 3% more than non-replicated systems, 48% less than fully-replicating all data, and 31% less than secure information dispersal schemes.
引用
收藏
页码:19 / 24
页数:6
相关论文
共 43 条
[11]  
Cogo V. V., 2016, 11 EUR C COMP SYST E
[12]  
Cogo V. V., 2019, ARXIV190508637, P1
[13]  
Cogo VV., 2015, Proc. of the 14th ACM Workshop on Privacy in the Electronic Society (WPES), P101
[14]   Accurate filtering of privacy-sensitive information in raw genomic data [J].
Decouchant, Jeremie ;
Fernandes, Maria ;
Volp, Marcus ;
Couto, Francisco M. ;
Esteves-Verissimo, Paulo .
JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 82 :1-12
[15]  
Deutsch P., 1996, GZIP file format specification version 4.3
[16]  
Douglis F, 2003, USENIX ASSOCIATION PROCEEDINGS OF THE GENERAL TRACK, P113
[17]   Routes for breaching and protecting genetic privacy [J].
Erlich, Yaniv ;
Narayanan, Arvind .
NATURE REVIEWS GENETICS, 2014, 15 (06) :409-421
[18]  
European Union, 2016, Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), VL 110, P1
[19]   A scalable framework for Adaptive Computational General Relativity on Heterogeneous Clusters [J].
Fernando, Milinda ;
Neilsen, David ;
Hirschmann, Eric W. ;
Sundar, Hari .
INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019), 2019, :1-12
[20]  
Freeman L., 2007, INFOSTOR