Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms

被引:3
|
作者
Chen, Yurou [1 ,2 ]
Zhang, Fengyi [1 ,2 ]
Liu, Zhiyong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai, Peoples R China
关键词
Reinforcement Learning; Policy gradient; Actor-critic; Value function; Bias-variance trade-off;
D O I
10.1016/j.neunet.2023.10.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE).
引用
收藏
页码:764 / 777
页数:14
相关论文
共 12 条
  • [1] Adaptive Advantage Estimation for Actor-Critic Algorithms
    Chen, Yurou
    Zhang, Fengyi
    Liu, Zhiyong
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [2] Bias-variance trade-off for prequential model list selection
    Ernest Fokoue
    Bertrand Clarke
    Statistical Papers, 2011, 52 : 813 - 833
  • [3] Bias-variance trade-off for prequential model list selection
    Fokoue, Ernest
    Clarke, Bertrand
    STATISTICAL PAPERS, 2011, 52 (04) : 813 - 833
  • [4] Bias-Variance Trade-Off and Shrinkage of Weights in Forecast Combination
    Blanc, Sebastian M.
    Setzer, Thomas
    MANAGEMENT SCIENCE, 2020, 66 (12) : 5720 - 5737
  • [5] Relaxed Molecular Clocks, the Bias-Variance Trade-off, and the Quality of Phylogenetic Inference
    Wertheim, Joel O.
    Sanderson, Michael J.
    Worobey, Michael
    Bjork, Adam
    SYSTEMATIC BIOLOGY, 2010, 59 (01) : 1 - 8
  • [6] Meta-Optimization of Bias-Variance Trade-Off in Stochastic Model Learning
    Aotani, Takumi
    Kobayashi, Taisuke
    Sugimoto, Kenji
    IEEE ACCESS, 2021, 9 : 148783 - 148799
  • [7] Reconciling modern machine-learning practice and the classical bias-variance trade-off
    Belkin, Mikhail
    Hsu, Daniel
    Ma, Siyuan
    Mandal, Soumik
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) : 15849 - 15854
  • [8] BEYOND THE BIAS VARIANCE TRADE-OFF: A MUTUAL INFORMATION TRADE-OFF IN DEEP LEARNING
    Lan, Xinjie
    Zhu, Bin
    Boncelet, Charles
    Barner, Kenneth
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [9] Markets matter: a simulation study of the bias-variance trade-off in comparison group selection for difference-in-differences analysis
    Forrow, Lauren Vollmer
    Rotter, Jason
    Blue, Laura
    Vogler, Jake
    Hatfield, Laura A.
    HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY, 2024, : 166 - 181
  • [10] An Optimistic Approach to the Temporal Difference Error in Off-Policy Actor-Critic Algorithms
    Saglam, Baturay
    Mutlu, Furkan B.
    Kozat, Suleyman S.
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 875 - 883