Rethinking Discount Regularization: New Interpretations, Unintended Consequences, and Solutions for Regularization in Reinforcement Learning

被引:0
|
作者
Rathnam, Sarah [1 ]
Parbhoo, Sonali [2 ]
Swaroop, Siddharth [1 ]
Pan, Weiwei [1 ]
Murphy, Susan A. [1 ]
Doshi-Velez, Finale [1 ]
机构
[1] Harvard Univ, John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02138 USA
[2] Imperial Coll London, London SW7 2BX, England
基金
美国国家科学基金会;
关键词
reinforcement learning; regularization; certainty equivalence; discount factor; Markov decision process;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to avoid overfitting when faced with sparse or noisy data. It is commonly interpreted as de-emphasizing or ignoring delayed effects. In this paper, we prove two alternative views of discount regularization that expose unintended consequences and motivate novel regularization methods. In model-based RL, planning under a lower discount factor acts like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. In model-free RL, discount regularization equates to planning using a weighted average Bellman update, where the agent plans as if the values of all state-action pairs are closer than implied by the ization by setting parameters locally for individual state-action pairs rather than globally. state-action-specific methods across empirical examples with both tabular and continuous state spaces.
引用
收藏
页码:1 / 48
页数:48
相关论文
共 14 条
  • [1] Learning in Games via Reinforcement and Regularization
    Mertikopoulos, Panayotis
    Sandholm, William H.
    MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1297 - 1324
  • [2] Offline Reinforcement Learning With Behavior Value Regularization
    Huang, Longyang
    Dong, Botao
    Xie, Wei
    Zhang, Weidong
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3692 - 3704
  • [3] ARFace: Attention-Aware and Regularization for Face Recognition With Reinforcement Learning
    Zhang, Liping
    Sun, Linjun
    Yu, Lina
    Dong, Xiaoli
    Chen, Jinchao
    Cai, Weiwei
    Wang, Chen
    Ning, Xin
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2022, 4 (01): : 30 - 42
  • [4] CHOQUET REGULARIZATION FOR CONTINUOUS-TIME REINFORCEMENT LEARNING
    Han, Xia
    Wang, Ruodu
    Zhou, Xun Yu
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (05) : 2777 - 2801
  • [5] Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization
    Zhang, Yuan
    Wang, Jianhong
    Boedecker, Joschka
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [6] Reinforcement Learning-Based Visual Navigation With Information-Theoretic Regularization
    Wu, Qiaoyun
    Xu, Kai
    Wang, Jun
    Xu, Mingliang
    Gong, Xiaoxi
    Manocha, Dinesh
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 731 - 738
  • [7] Regularization-Adapted Anderson Acceleration for multi-agent reinforcement learning
    Wang, Siying
    Chen, Wenyu
    Huang, Liwei
    Zhang, Fan
    Zhao, Zhitong
    Qu, Hong
    KNOWLEDGE-BASED SYSTEMS, 2023, 275
  • [8] Reinforcement learning method based on sample regularization and adaptive learning rate for AGV path planning
    Nie, Jun
    Zhang, Guihua
    Lu, Xiao
    Wang, Haixia
    Sheng, Chunyang
    Sun, Lijie
    NEUROCOMPUTING, 2025, 614
  • [9] Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning
    Lingwei Zhu
    Takamitsu Matsubara
    Machine Learning, 2023, 112 : 4527 - 4562
  • [10] Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning
    Zhu, Lingwei
    Matsubara, Takamitsu
    MACHINE LEARNING, 2023, 112 (11) : 4527 - 4562