Balancing Utility and Fairness against Privacy in Medical Data

被引:0
作者
Chester, Andrew [1 ]
Koh, Yun Sing [1 ]
Wicker, Jorg [1 ]
Sun, Quan [2 ]
Lee, Junjae [2 ]
机构
[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand
[2] Orion Hlth, Auckland, New Zealand
来源
2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI) | 2020年
关键词
privacy; fairness; accuracy; imbalance; medicine; machine learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are numerous challenges when designing algorithms that interact with sensitive data, such as, medical records. One of these challenges is privacy. However, there is a tension between privacy, utility (model accuracy), and fairness. While de-identification techniques, such as generalisation and suppression, have been proposed to enable privacy protection, it comes with a cost, specifically to fairness and utility. Recent work on algorithmic fairness defines fairness as a guarantee of similar outputs for "similar" inputs. This notion is discussed in connection to de-identification. This research investigates the trade-off between privacy, fairness, and utility. In contrast. other work investigates the trade-off between privacy and overall utility. In this research, we investigate the effects of two de-identification techniques, k-anonymity and differential privacy, on both utility and fairness. We propose two measures to calculate the trade-off between privacy-utility and privacy-fairness. Other research has provided guarantees for privacy regarding utility; this research focuses on the trade-offs given set de-identification levels and relies on these guarantees. We discuss the effects of de-identification on data of different characteristics: class imbalance, and outcome imbalance. We evaluated these effects on synthetic datasets and real-world datasets. As a case study, we analysed the Medical Expenditure Panel Survey dataset.
引用
收藏
页码:1226 / 1233
页数:8
相关论文
共 22 条
  • [1] Agency for Healthcare Research and Quality (AHRQ),, MED EXP PAN SURV MEP
  • [2] Angwin J., 2016, MACHINE BIAS
  • [3] [Anonymous], 2017, CREDIT SCORING ITS A
  • [4] Avent B., 2019, ARXIV190510862
  • [5] Big Data's Disparate Impact
    Barocas, Solon
    Selbst, Andrew D.
    [J]. CALIFORNIA LAW REVIEW, 2016, 104 (03) : 671 - 732
  • [6] AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias
    Bellamy, R. K. E.
    Dey, K.
    Hind, M.
    Hoffman, S. C.
    Houde, S.
    Kannan, K.
    Lohia, P.
    Martino, J.
    Mehta, S.
    Mojsilovie, A.
    Nagar, S.
    Ramamurthy, K. Natesan
    Richards, J.
    Saha, D.
    Sattigeri, P.
    Singh, M.
    Varshney, K. R.
    Zhang, Y.
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (4-5)
  • [7] Besse P., 2020, ARXIV200314263
  • [8] Dua Dheeru, 2017, UCI Machine Learning Repository
  • [9] Calibrating noise to sensitivity in private data analysis
    Dwork, Cynthia
    McSherry, Frank
    Nissim, Kobbi
    Smith, Adam
    [J]. THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 : 265 - 284
  • [10] Fletcher Sam, 2015, International Journal of Computer Theory and Engineering, V7, P21, DOI 10.7763/IJCTE.2015.V7.924