Differentially Private Reinforcement Learning with Linear Function Approximation

被引:4
作者
Zhou, Xingyu [1 ]
机构
[1] Wayne State Univ, 5050 Anthony Wayne Dr, Detroit, MI 48202 USA
关键词
reinforcement learning; differential privacy; linear function approximations; ALGORITHMS; DESIGN;
D O I
10.1145/3508028
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized services , where users' sensitive and private information needs to be protected, we study regret minimization in finite-horizon Markov decision processes (MDPs) under the constraints of differential privacy (DP). Compared to existing private RL algorithms that work only on tabular finite-state, finite-actions MDPs, we take the first step towards privacy-preserving learning in MDPs with large state and action spaces. Specifically, we consider MDPs with linear function approximation (in particular linear mixture MDPs) under the notion of joint differential privacy (JDP), where the RL agent is responsible for protecting users' sensitive data. We design two private RL algorithms that are based on value iteration and policy optimization, respectively, and show that they enjoy sub-linear regret performance while guaranteeing privacy protection. Moreover, the regret bounds are independent of the number of states, and scale at most logarithmically with the number of actions, making the algorithms suitable for privacy protection in nowadays large-scale personalized services. Our results are achieved via a general procedure for learning in linear mixture MDPs under changing regularizers, which not only generalizes previous results for non-private learning, but also serves as a building block for general private reinforcement learning.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Improved Differentially Private Euclidean Distance Approximation
    Stausholm, Nina Mesing
    PODS '21: PROCEEDINGS OF THE 40TH SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2021, : 42 - 56
  • [22] Reinforcement learning algorithms with function approximation: Recent advances and applications
    Xu, Xin
    Zuo, Lei
    Huang, Zhenhua
    INFORMATION SCIENCES, 2014, 261 : 1 - 31
  • [23] A policy gradient reinforcement learning algorithm with fuzzy function approximation
    Gu, DB
    Yang, EF
    IEEE ROBIO 2004: Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2004, : 936 - 940
  • [24] A fuzzy-based function approximation technique for reinforcement learning
    Wu, Cheng
    Song, Huichun
    Yan, Changsheng
    Wang, Yiming
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (06) : 3909 - 3920
  • [25] Distributed Value Function Approximation for Collaborative Multiagent Reinforcement Learning
    Stankovic, Milos S.
    Beko, Marko
    Stankovic, Srdjan S.
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2021, 8 (03): : 1270 - 1280
  • [26] A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation
    Bhandari, Jalaj
    Russo, Daniel
    Singal, Raghav
    OPERATIONS RESEARCH, 2021, 69 (03) : 950 - 973
  • [27] Multi-agent reinforcement learning via knowledge transfer with differentially private noise
    Cheng, Zishuo
    Ye, Dayong
    Zhu, Tianqing
    Zhou, Wanlei
    Yu, Philip S.
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (01) : 799 - 828
  • [28] Differentially Private Distributed Learning
    Zhou, Yaqin
    Tang, Shaojie
    INFORMS JOURNAL ON COMPUTING, 2020, 32 (03) : 779 - 789
  • [29] Efficient exploration through active learning for value function approximation in reinforcement learning
    Akiyama, Takayuki
    Hachiya, Hirotaka
    Sugiyama, Masashi
    NEURAL NETWORKS, 2010, 23 (05) : 639 - 648
  • [30] Multiscale Q-learning with linear function approximation
    Shalabh Bhatnagar
    K. Lakshmanan
    Discrete Event Dynamic Systems, 2016, 26 : 477 - 509