Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

被引:0
作者
Liao, Chonghua [1 ]
He, Jiafan [2 ]
Gu, Quanquan [2 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
来源
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 189 | 2022年 / 189卷
基金
美国国家科学基金会;
关键词
Machine learning; Reinforcement learning; Differential privacy; Linear mixture MDPs;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel (epsilon, delta)-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an (O) over tilde (d(5/4)H(7/4)T(3/4) (logp1/delta))(1/4) root 1/epsilon) regret, where d is the dimension of feature mapping, H is the length of the episodes, and T is the number of interactions with the environment. We also prove a lower bound Omega(dH root T/(e epsilon(e(epsilon) - 1))) for learning linear mixture MDPs under epsilon-LDP constraint. Experiments on synthetic datasets verify the effectiveness of our algorithm. To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.
引用
收藏
页数:16
相关论文
共 44 条
[21]  
Jia ZY, 2020, PR MACH LEARN RES, V120, P666
[22]  
Jiang N, 2017, PR MACH LEARN RES, V70
[23]  
JIN C., 2020, C LEARN THEOR, P2137
[24]  
Jin C, 2019, Arxiv, DOI arXiv:1902.03736
[25]   What Can We Learn Privately? [J].
Kasiviswanathan, Shiva Prasad ;
Lee, Homin K. ;
Nissim, Kobbi ;
Raskhodnikova, Sofya ;
Smith, Adam .
PROCEEDINGS OF THE 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2008, :531-+
[26]  
Lattimore T, 2020, BANDIT ALGORITHMS, P1, DOI 10.1017/9781108571401
[27]  
Luyo Paul, 2021, arXiv
[28]  
Modi A, 2020, PR MACH LEARN RES, V108, P2010
[29]  
Puterman M. L., 1994, Markov Decision Processes: Discrete Stochastic Dynamic Programming, DOI DOI 10.1002/9780470316887
[30]  
Russo Daniel, 2013, Advances in Neural Information Processing Systems, P2256