A Survey of the Theories and Methods of Privacy Preserving of Genome Data

被引:0
作者
Liu H. [1 ,2 ,3 ]
Peng C.-G. [1 ,2 ,3 ]
Wu Z.-Q. [4 ]
Tian Y.-L. [2 ,3 ]
Tian F. [4 ]
机构
[1] Guizhou Big Data Academy, Guizhou University, Guiyang
[2] College of Computer Science and Technology, Guizhou University, Guiyang
[3] State Key Laboratory of Public Big Data, Guizhou University, Guiyang
[4] School of Computer Science, Shaanxi Normal University, Xi'an
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2021年 / 44卷 / 07期
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Genomic privacy; Privacy leakage; Privacy metric; Privacy preserving; Utility metric;
D O I
10.11897/SP.J.1016.2021.01430
中图分类号
学科分类号
摘要
Genome data have been widely applied to the scientific research, healthcare, legal and forensic, and direct-to-consumer. Genome data can uniquely identify an individual, and it can closely associate with the inheritance, health, phenotype, and kinship. Furthermore, genome data are stable over time. Thus, the improper management and abuse of genome data will bring about the privacy concerns. To solve this problem, in addition to the supervision of relevant laws and regulations, privacy preserving technologies are also used to achieve the privacy preserving of genome data. To this end, this paper surveys the theories and methods of privacy preserving of genome data. First, this paper induces the ecosystem of genome data from genome sequencing to applications. According to the properties of genome data, this paper also analyzes privacy leakage concerns of the ecosystem of genome data. Second, this paper sums up the privacy threats to genome data from four aspects of individual identification, linkage attack, genotype inference, and Bayesian inference. This paper makes a comparative analysis of these privacy threats from five aspects of scenario, data type, method, attack efficiency, and threat level. This paper also states the equilibrium model between re-identification risk and the value of sharing genome data. Third, this paper presents the metrics of privacy quantification of genome data from three aspects of inaccuracy, uncertainty and health privacy. This paper also summarizes the metrics of utility quantification of genome data from seven aspects of information loss, chi-square statistics, false positive and false negative, error rate, accuracy rate, expected accuracy rate, and expected interval width. This paper compares and analyzes the privacy and utility metrics of genome data from the aspects of measurement method, measurement formula, protection effect, application scenario, attack difficulty, and adoption rate. Forth, this paper concludes that the ecosystem of genome data consists of sequencing and storage, sharing and aggregation, research and analysis, healthcare, legal and forensic, and direct-to-consumer, and this paper also analyzes the privacy leakage threats of sequencing and storage, sharing and aggregation, and applications of the ecosystem of genome data. At the same time, this paper introduces privacy preserving methods of genome data from four aspects of cryptography, anonymity, differential privacy, and hybrid approach. This paper compares and analyzes the privacy preserving methods of genome data from three aspects of method, property, and protection effect. This paper classifies and covers the existing work of privacy preserving for privacy concerns of the ecosystem of genome data based on the corresponding privacy preserving methods. This paper also makes a comparative analysis of the existing work of privacy preserving of genome data from two aspects of scenario oriented and protection effect of scenario oriented. Finally, this paper compares and analyzes the existing methods of privacy preserving of genome data, and this paper discusses the future challenges to genomic privacy preserving of sequencing and storage, sharing and aggregation, research and analysis, healthcare, legal and forensic, and direct-to-consumer of the ecosystem of genome data. This work serves as a basis of solving the problem of privacy leakage of genome data, and this work promotes the research of privacy preserving of genome data. © 2021, Science Press. All right reserved.
引用
收藏
页码:1430 / 1480
页数:50
相关论文
共 178 条
  • [11] Dwork C, McSherry F, Nissim K, Et al., Calibrating noise to sensitivity in private data analysis, Proceedings of the 3rd Theory of Cryptography Conference, pp. 265-284, (2006)
  • [12] Dugan T, Zou X., A survey of secure multiparty computation protocols for privacy preserving genetic tests, Proceedings of the IEEE 1st International Conference on Connected Health: Applications, Systems and Engineering Technologies, pp. 173-182, (2016)
  • [13] Shi X, Wu X., An overview of human genetic privacy, Annals of the New York Academy of Sciences, 1387, 2017, pp. 61-72, (2017)
  • [14] Hasan Z, Mahdi M S R, Mohammed N., Secure count queries on encrypted genomic data: A survey, IEEE Internet Computing, 22, 2, pp. 71-82, (2018)
  • [15] Aziz M M A, Sadat M N, Alhadidi D, Et al., Privacy-preserving techniques of genomic data-A survey, Briefings in Bioinformatics, 20, 3, pp. 887-895, (2019)
  • [16] Mittos A, Malin B, Cristofaro E., Systematizing genome privacy research: A privacy-enhancing technologies perspective, Proceedings on Privacy Enhancing Technologies, 2019, 1, pp. 87-107, (2019)
  • [17] Yakubu A M, Chen Y P P., Ensuring privacy and security of genomic data and functionalities, Briefings in Bioinformatics, 21, 2, pp. 511-526, (2020)
  • [18] Weidman J, Aurite W, Grossklags J., On sharing intentions, and personal and interdependent privacy considerations for genetic data: A vignette study, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16, 4, pp. 1349-1361, (2019)
  • [19] Lin Z, Owen A B, Altman R B., Genomic research and human subject privacy, Science, 305, 5689, (2004)
  • [20] Homer N, Szelinger S, Redman M, Et al., Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays, PLoS Genetics, 4, 8, (2008)