A Survey of the Theories and Methods of Privacy Preserving of Genome Data

被引:0
作者
Liu H. [1 ,2 ,3 ]
Peng C.-G. [1 ,2 ,3 ]
Wu Z.-Q. [4 ]
Tian Y.-L. [2 ,3 ]
Tian F. [4 ]
机构
[1] Guizhou Big Data Academy, Guizhou University, Guiyang
[2] College of Computer Science and Technology, Guizhou University, Guiyang
[3] State Key Laboratory of Public Big Data, Guizhou University, Guiyang
[4] School of Computer Science, Shaanxi Normal University, Xi'an
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2021年 / 44卷 / 07期
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Genomic privacy; Privacy leakage; Privacy metric; Privacy preserving; Utility metric;
D O I
10.11897/SP.J.1016.2021.01430
中图分类号
学科分类号
摘要
Genome data have been widely applied to the scientific research, healthcare, legal and forensic, and direct-to-consumer. Genome data can uniquely identify an individual, and it can closely associate with the inheritance, health, phenotype, and kinship. Furthermore, genome data are stable over time. Thus, the improper management and abuse of genome data will bring about the privacy concerns. To solve this problem, in addition to the supervision of relevant laws and regulations, privacy preserving technologies are also used to achieve the privacy preserving of genome data. To this end, this paper surveys the theories and methods of privacy preserving of genome data. First, this paper induces the ecosystem of genome data from genome sequencing to applications. According to the properties of genome data, this paper also analyzes privacy leakage concerns of the ecosystem of genome data. Second, this paper sums up the privacy threats to genome data from four aspects of individual identification, linkage attack, genotype inference, and Bayesian inference. This paper makes a comparative analysis of these privacy threats from five aspects of scenario, data type, method, attack efficiency, and threat level. This paper also states the equilibrium model between re-identification risk and the value of sharing genome data. Third, this paper presents the metrics of privacy quantification of genome data from three aspects of inaccuracy, uncertainty and health privacy. This paper also summarizes the metrics of utility quantification of genome data from seven aspects of information loss, chi-square statistics, false positive and false negative, error rate, accuracy rate, expected accuracy rate, and expected interval width. This paper compares and analyzes the privacy and utility metrics of genome data from the aspects of measurement method, measurement formula, protection effect, application scenario, attack difficulty, and adoption rate. Forth, this paper concludes that the ecosystem of genome data consists of sequencing and storage, sharing and aggregation, research and analysis, healthcare, legal and forensic, and direct-to-consumer, and this paper also analyzes the privacy leakage threats of sequencing and storage, sharing and aggregation, and applications of the ecosystem of genome data. At the same time, this paper introduces privacy preserving methods of genome data from four aspects of cryptography, anonymity, differential privacy, and hybrid approach. This paper compares and analyzes the privacy preserving methods of genome data from three aspects of method, property, and protection effect. This paper classifies and covers the existing work of privacy preserving for privacy concerns of the ecosystem of genome data based on the corresponding privacy preserving methods. This paper also makes a comparative analysis of the existing work of privacy preserving of genome data from two aspects of scenario oriented and protection effect of scenario oriented. Finally, this paper compares and analyzes the existing methods of privacy preserving of genome data, and this paper discusses the future challenges to genomic privacy preserving of sequencing and storage, sharing and aggregation, research and analysis, healthcare, legal and forensic, and direct-to-consumer of the ecosystem of genome data. This work serves as a basis of solving the problem of privacy leakage of genome data, and this work promotes the research of privacy preserving of genome data. © 2021, Science Press. All right reserved.
引用
收藏
页码:1430 / 1480
页数:50
相关论文
共 178 条
  • [1] Christensen K D, Dukhovny D, Siebert U, Et al., Assessing the costs and cost-effectiveness of genomic sequencing, Journal of Personalized Medicine, 5, 4, pp. 470-486, (2015)
  • [2] Naveed M, Ayday E, Clayton E W, Et al., Privacy in the genomic era, ACM Computing Surveys, 48, 1, (2015)
  • [3] Ayday E, Cristofaro E, Hubaux J P, Et al., Whole genome sequencing: Revolutionary medicine or privacy nightmare?, Computer, 48, 2, pp. 58-66, (2015)
  • [4] Raisaro J L, Ayday E, Hubaux J P., Patient privacy in the genomic era, Praxis, 103, 10, pp. 579-586, (2014)
  • [5] Akgun M, Bayrak A O, Ozer B, Et al., Privacy preserving processing of genomic data: A survey, Journal of Biomedical Informatics, 56, pp. 103-111, (2015)
  • [6] Humbert M, Ayday E, Hubaux J P, Et al., Quantifying interdependent risks in genomic privacy, ACM Transactions on Privacy and Security, 20, 1, (2017)
  • [7] Hudson K L, Rothenberg K H, Andrews L B, Et al., Genetic discrimination and health insurance: An urgent need for reform, Science, 270, 5235, pp. 391-393, (1995)
  • [8] Stajano F, Bianchi L, Lio P, Et al., Forensic genomics: Kin privacy, driftnets and other open questions, Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society, pp. 15-22, (2008)
  • [9] Katz J, Lindell Y., Introduction to Modern Cryptography, (2014)
  • [10] Samarati P, Sweeney L., Generalizing data to provide anonymity when disclosing information (Abstract), Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (1998)