Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

被引：0

作者：

Liao, Chonghua ^{[1
]}

He, Jiafan ^{[2
]}

Gu, Quanquan ^{[2
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China

[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

来源：

ASIAN CONFERENCE ON MACHINE LEARNING, VOL 189 | 2022年 / 189卷

基金：

美国国家科学基金会;

关键词：

Machine learning; Reinforcement learning; Differential privacy; Linear mixture MDPs;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel (epsilon, delta)-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an (O) over tilde (d(5/4)H(7/4)T(3/4) (logp1/delta))(1/4) root 1/epsilon) regret, where d is the dimension of feature mapping, H is the length of the episodes, and T is the number of interactions with the environment. We also prove a lower bound Omega(dH root T/(e epsilon(e(epsilon) - 1))) for learning linear mixture MDPs under epsilon-LDP constraint. Experiments on synthetic datasets verify the effectiveness of our algorithm. To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.

引用

页数：16

共 44 条

[21]

Jia ZY, 2020, PR MACH LEARN RES, V120, P666

[22]

Jiang N, 2017, PR MACH LEARN RES, V70

[23]

JIN C., 2020, C LEARN THEOR, P2137

[24]

Jin C, 2019, Arxiv, DOI arXiv:1902.03736

[25] What Can We Learn Privately? [J].

Kasiviswanathan, Shiva Prasad ;

Lee, Homin K. ;

Nissim, Kobbi ;

Raskhodnikova, Sofya ;

Smith, Adam .

PROCEEDINGS OF THE 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2008, :531-+

[26]

Lattimore T, 2020, BANDIT ALGORITHMS, P1, DOI 10.1017/9781108571401

[27]

Luyo Paul, 2021, arXiv

[28]

Modi A, 2020, PR MACH LEARN RES, V108, P2010

[29]

Puterman M. L., 1994, Markov Decision Processes: Discrete Stochastic Dynamic Programming, DOI DOI 10.1002/9780470316887

[30]

Russo Daniel, 2013, Advances in Neural Information Processing Systems, P2256

← 1 2 3 4 5 →