Rethinking Discount Regularization: New Interpretations, Unintended Consequences, and Solutions for Regularization in Reinforcement Learning

被引：0

作者：

Rathnam, Sarah ^{[1
]}

Parbhoo, Sonali ^{[2
]}

Swaroop, Siddharth ^{[1
]}

Pan, Weiwei ^{[1
]}

Murphy, Susan A. ^{[1
]}

Doshi-Velez, Finale ^{[1
]}

机构：

[1] Harvard Univ, John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02138 USA

[2] Imperial Coll London, London SW7 2BX, England

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2024年 / 25卷

基金：

美国国家科学基金会;

关键词：

reinforcement learning; regularization; certainty equivalence; discount factor; Markov decision process;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to avoid overfitting when faced with sparse or noisy data. It is commonly interpreted as de-emphasizing or ignoring delayed effects. In this paper, we prove two alternative views of discount regularization that expose unintended consequences and motivate novel regularization methods. In model-based RL, planning under a lower discount factor acts like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. In model-free RL, discount regularization equates to planning using a weighted average Bellman update, where the agent plans as if the values of all state-action pairs are closer than implied by the ization by setting parameters locally for individual state-action pairs rather than globally. state-action-specific methods across empirical examples with both tabular and continuous state spaces.

引用

页码：1 / 48

页数：48

共 14 条

[1] Learning in Games via Reinforcement and Regularization
Mertikopoulos, Panayotis
Sandholm, William H.
MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1297 - 1324
[2] Offline Reinforcement Learning With Behavior Value Regularization
Huang, Longyang
Dong, Botao
Xie, Wei
Zhang, Weidong
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3692 - 3704
[3] ARFace: Attention-Aware and Regularization for Face Recognition With Reinforcement Learning
Zhang, Liping
Sun, Linjun
Yu, Lina
Dong, Xiaoli
Chen, Jinchao
Cai, Weiwei
Wang, Chen
Ning, Xin
IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2022, 4 (01): : 30 - 42
[4] CHOQUET REGULARIZATION FOR CONTINUOUS-TIME REINFORCEMENT LEARNING
Han, Xia
Wang, Ruodu
Zhou, Xun Yu
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (05) : 2777 - 2801
[5] Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization
Zhang, Yuan
Wang, Jianhong
Boedecker, Joschka
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[6] Reinforcement Learning-Based Visual Navigation With Information-Theoretic Regularization
Wu, Qiaoyun
Xu, Kai
Wang, Jun
Xu, Mingliang
Gong, Xiaoxi
Manocha, Dinesh
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 731 - 738
[7] Regularization-Adapted Anderson Acceleration for multi-agent reinforcement learning
Wang, Siying
Chen, Wenyu
Huang, Liwei
Zhang, Fan
Zhao, Zhitong
Qu, Hong
KNOWLEDGE-BASED SYSTEMS, 2023, 275
[8] Reinforcement learning method based on sample regularization and adaptive learning rate for AGV path planning
Nie, Jun
Zhang, Guihua
Lu, Xiao
Wang, Haixia
Sheng, Chunyang
Sun, Lijie
NEUROCOMPUTING, 2025, 614
[9] Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning
Lingwei Zhu
Takamitsu Matsubara
Machine Learning, 2023, 112 : 4527 - 4562
[10] Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning
Zhu, Lingwei
Matsubara, Takamitsu
MACHINE LEARNING, 2023, 112 (11) : 4527 - 4562

← 1 2 →