Private measures, random walks, and synthetic data

被引:1
|
作者
Boedihardjo, March [1 ]
Strohmer, Thomas [2 ]
Vershynin, Roman [3 ]
机构
[1] Michigan State Univ, Dept Math, E Lansing, MI 48864 USA
[2] Univ Calif Davis, Dept Math, Davis, CA USA
[3] Univ Calif Irvine, Dept Math, Irvine, CA 92697 USA
关键词
Random walks; Differential privacy; Synthetic data;
D O I
10.1007/s00440-024-01279-z
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex-but very common-machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget epsilon \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} bounded away from zero. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.
引用
收藏
页码:569 / 611
页数:43
相关论文
共 50 条
  • [1] PHYSICAL MEASURES FOR NONLINEAR RANDOM WALKS ON INTERVAL
    Kleptsyn, V.
    Volk, D.
    MOSCOW MATHEMATICAL JOURNAL, 2014, 14 (02) : 339 - 365
  • [2] Scaling limit of the collision measures of multiple random walks
    Dinh-Toan Nguyen
    ALEA-LATIN AMERICAN JOURNAL OF PROBABILITY AND MATHEMATICAL STATISTICS, 2023, 20 (02): : 1385 - 1410
  • [3] Hamiltonian Cycles, Random Walks, and Discounted Occupational Measures
    Eshragh, Ali
    Filar, Jerzy
    MATHEMATICS OF OPERATIONS RESEARCH, 2011, 36 (02) : 258 - 270
  • [4] Does Differentially Private Synthetic Data Lead to Synthetic Discoveries?
    Perez, Ileana Montoya
    Movahedi, Parisa
    Nieminen, Valtteri
    Airola, Antti
    Pahikkala, Tapio
    METHODS OF INFORMATION IN MEDICINE, 2024, 63 (01/02) : 35 - 51
  • [5] Algorithmically Effective Differentially Private Synthetic Data
    He, Yiyun
    Vershynin, Roman
    Zhu, Yizhe
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [6] Private Sampling: A Noiseless Approach for Generating Differentially Private Synthetic Data
    Boedihardjo, March
    Strohmer, Thomas
    Vershynin, Roman
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2022, 4 (03): : 1082 - 1115
  • [7] On uniqueness of invariant measures for random walks on HOMEO+ (R)
    Brofferio, Sara
    Buraczewski, Dariusz
    Szarek, Tomasz
    ERGODIC THEORY AND DYNAMICAL SYSTEMS, 2022, 42 (07) : 2207 - 2238
  • [8] Collaborative learning from distributed data with differentially private synthetic data
    Prediger, Lukas
    Jalko, Joonas
    Honkela, Antti
    Kaski, Samuel
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [9] Random Walks for Synthetic Aperture Radar Image Fusion in Framelet Domain
    Yang, Xiaoyuan
    Wang, Jingkai
    Zhu, Ridong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (02) : 851 - 865
  • [10] On random walks and switched random walks on homogeneous spaces
    Moreno, Elvira
    Velasco, Mauricio
    COMBINATORICS PROBABILITY AND COMPUTING, 2023, 32 (03) : 398 - 421