Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

被引：0

作者：

Liao, Chonghua ^{[1
]}

He, Jiafan ^{[2
]}

Gu, Quanquan ^{[2
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China

[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

来源：

ASIAN CONFERENCE ON MACHINE LEARNING, VOL 189 | 2022年 / 189卷

基金：

美国国家科学基金会;

关键词：

Machine learning; Reinforcement learning; Differential privacy; Linear mixture MDPs;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel (epsilon, delta)-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an (O) over tilde (d(5/4)H(7/4)T(3/4) (logp1/delta))(1/4) root 1/epsilon) regret, where d is the dimension of feature mapping, H is the length of the episodes, and T is the number of interactions with the environment. We also prove a lower bound Omega(dH root T/(e epsilon(e(epsilon) - 1))) for learning linear mixture MDPs under epsilon-LDP constraint. Experiments on synthetic datasets verify the effectiveness of our algorithm. To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.

引用

页数：16

共 44 条

[1] Deep Learning with Differential Privacy [J].

Abadi, Martin ;

Chu, Andy ;

Goodfellow, Ian ;

McMahan, H. Brendan ;

Mironov, Ilya ;

Talwar, Kunal ;

Zhang, Li .

CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318

[2]

Abbasi- Yadkori Yasin, 2011, P ADV NEUR INF PROC, V11, P2312

[3]

Ayoub A., 2020, INT C MACHINE LEARNI, P463

[4]

Azar MG, 2017, PR MACH LEARN RES, V70

[5]

Balle B, 2016, PR MACH LEARN RES, V48

[6]

Basu D, 2020, Arxiv, DOI arXiv:1905.12298

[7] ESTIMATION OF DENSITIES - MINIMAL RISK [J].

BRETAGNOLLE, J ;

HUBER, C .

ZEITSCHRIFT FUR WAHRSCHEINLICHKEITSTHEORIE UND VERWANDTE GEBIETE, 1979, 47 (02) :119-137

[8]

Cai Q., 2020, INT C MACHINE LEARNI, P1283

[9]

Chen X., 2020, INT C MACHINE LEARNI, P1757

[10]

Cover T. A., 2006, Elements of information theory, V2nd

← 1 2 3 4 5 →