Differentially Private Reinforcement Learning with Linear Function Approximation

被引:4
作者
Zhou, Xingyu [1 ]
机构
[1] Wayne State Univ, 5050 Anthony Wayne Dr, Detroit, MI 48202 USA
关键词
reinforcement learning; differential privacy; linear function approximations; ALGORITHMS; DESIGN;
D O I
10.1145/3508028
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized services , where users' sensitive and private information needs to be protected, we study regret minimization in finite-horizon Markov decision processes (MDPs) under the constraints of differential privacy (DP). Compared to existing private RL algorithms that work only on tabular finite-state, finite-actions MDPs, we take the first step towards privacy-preserving learning in MDPs with large state and action spaces. Specifically, we consider MDPs with linear function approximation (in particular linear mixture MDPs) under the notion of joint differential privacy (JDP), where the RL agent is responsible for protecting users' sensitive data. We design two private RL algorithms that are based on value iteration and policy optimization, respectively, and show that they enjoy sub-linear regret performance while guaranteeing privacy protection. Moreover, the regret bounds are independent of the number of states, and scale at most logarithmically with the number of actions, making the algorithms suitable for privacy protection in nowadays large-scale personalized services. Our results are achieved via a general procedure for learning in linear mixture MDPs under changing regularizers, which not only generalizes previous results for non-private learning, but also serves as a building block for general private reinforcement learning.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Multiscale Q-learning with linear function approximation
    Bhatnagar, Shalabh
    Lakshmanan, K.
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2016, 26 (03): : 477 - 509
  • [32] Reinforcement learning control with function approximation via multivariate simplex splines
    Feng, Yiting
    Zhou, Ye
    Ho, Hann Woei
    Mat Isa, Nor Ashidi
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2023,
  • [33] ON CONVERGENCE RATE OF ADAPTIVE MULTISCALE VALUE FUNCTION APPROXIMATION FOR REINFORCEMENT LEARNING
    Li, Tao
    Zhu, Quanyan
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [34] Adaptive Fuzzy Function Approximation for Multi-Agent Reinforcement Learning
    Wu, Cheng
    Meleis, Waleed
    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 2, 2009, : 169 - 176
  • [35] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
    Veeriah, Vivek
    van Seijen, Harm
    Sutton, Richard S.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
  • [36] The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation
    Fairbank, Michael
    Alonso, Eduardo
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [37] Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning
    Doan, Thinh T.
    Maguluri, Siva Theja
    Romberg, Justin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [38] Using Reinforcement Learning to Control Traffic Signals in a Real-World Scenario: An Approach Based on Linear Function Approximation
    Alegre, Lucas N.
    Ziemke, Theresa
    Bazzan, Ana L. C.
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) : 9126 - 9135
  • [39] Reinforcement learning with approximation spaces
    Peters, James F.
    Henry, Christopher
    FUNDAMENTA INFORMATICAE, 2006, 71 (2-3) : 323 - 349
  • [40] Differentially Private Hypothesis Transfer Learning
    Wang, Yang
    Gu, Quanquan
    Brown, Donald
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT II, 2019, 11052 : 811 - 826