Implications of Data Anonymization on the Statistical Evidence of Disparity

被引:7
|
作者
Xu, Heng [1 ]
Zhang, Nan [1 ]
机构
[1] Amer Univ, Kogod Sch Business, Washington, DC 20016 USA
基金
美国国家科学基金会;
关键词
privacy; data anonymization; discrimination; statistical disparity; DIFFERENTIAL PRIVACY; HEALTH DISPARITIES; K-ANONYMITY; BIAS; DISCRIMINATION; PROTECTION; ACCURACY; SECURITY; PROOF;
D O I
10.1287/mnsc.2021.4028
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Research and practical development of data-anonymization techniques have proliferated in recent years. Yet, limited attention has been paid to examine the potentially disparate impact of privacy protection on underprivileged subpopulations. This study is one of the first attempts to examine the extent to which data anonymization could mask the gross statistical disparities between subpopulations in the data. We first describe two common mechanisms of data anonymization and two prevalent types of statistical evidence for disparity. Then, we develop conceptual foundation and mathematical formalism demonstrating that the two data-anonymization mechanisms have distinctive impacts on the identifiability of disparity, which also varies based on its statistical operationalization. After validating our findings with empirical evidence, we discuss the business and policy implications, highlighting the need for firms and policy makers to balance between the protection of privacy and the recognition/rectification of disparate impact.
引用
收藏
页码:2600 / 2618
页数:20
相关论文
共 50 条
  • [1] A Review of Anonymization for Healthcare Data
    Olatunji, Iyiola E.
    Rauch, Jens
    Katzensteiner, Matthias
    Khosla, Megha
    BIG DATA, 2022, : 538 - 555
  • [2] Efficient multimedia big data anonymization
    Jang, Sung-Bong
    Ko, Young-Woong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (17) : 17855 - 17872
  • [3] Hybrid Data Privacy and Anonymization Algorithms for Smart Health Applications
    Fakeeroodeen Y.N.
    Beeharry Y.
    SN Computer Science, 2021, 2 (2)
  • [4] Anonymization of distribution feeder data using statistical distribution and parameter estimation approach
    Ali, Muhammad
    Prakash, Krishneel
    Macana, Carlos
    Rabiul, Md
    Hussain, Akhtar
    Pota, Hemanshu
    SUSTAINABLE ENERGY TECHNOLOGIES AND ASSESSMENTS, 2022, 52
  • [5] An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques
    Eyupoglu, Can
    Aydin, Muhammed Ali
    Zaim, Abdul Halim
    Sertbas, Ahmet
    ENTROPY, 2018, 20 (05)
  • [6] A utility based approach for data stream anonymization
    Sopaoglu, Ugur
    Abul, Osman
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 54 (03) : 605 - 631
  • [7] Anonymization of nominal data based on semantic marginality
    Domingo-Ferrer, Josep
    Sanchez, David
    Rufian-Torrell, Guillem
    INFORMATION SCIENCES, 2013, 242 : 35 - 48
  • [8] Privacy and utility preserving data clustering for data anonymization and distribution on Hadoop
    Nayahi, J. Jesu Vedha
    Kavitha, V.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 74 : 393 - 408
  • [9] Data privacy in the Internet of Things based on anonymization: A review
    Neves, Flavio
    Souza, Rafael
    Sousa, Juliana
    Bonfim, Michel
    Garcia, Vinicius
    JOURNAL OF COMPUTER SECURITY, 2023, 31 (03) : 261 - 291
  • [10] Anonymization in the time of big data
    Domingo-Ferrer J.
    Soria-Comas J.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, 9867 LNCS : 57 - 68