Selective Dyna-Style Planning Under Limited Model Capacity

被引:0
|
作者
Abbas, Zaheer [1 ,2 ]
Sokota, Samuel [1 ,2 ]
Talvitie, Erin J. [3 ]
White, Martha [1 ,2 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] Alberta Machine Intelligence Inst Amii, Edmonton, AB, Canada
[3] Harvey Mudd Coll, Claremont, CA 91711 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 | 2020年 / 119卷
基金
美国国家科学基金会;
关键词
ARCADE LEARNING-ENVIRONMENT; DROPOUT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this paper, we investigate the idea of using an imperfect model selectively. The agent should plan in parts of the state space where the model would be helpful but refrain from using the model where it would be harmful. An effective selective planning mechanism requires estimating predictive uncertainty, which arises out of aleatoric uncertainty, parameter uncertainty, and model inadequacy, among other sources. Prior work has focused on parameter uncertainty for selective planning. In this work, we emphasize the importance of model inadequacy. We show that heteroscedastic regression can signal predictive uncertainty arising from model inadequacy that is complementary to that which is detected by methods designed for parameter uncertainty, indicating that considering both parameter uncertainty and model inadequacy may be a more promising direction for effective selective planning than either in isolation.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models
    Aminmansour, Farzane
    Jafferjee, Taher
    Imani, Ehsan
    Talvitie, Erin J.
    Bowling, Michael
    White, Martha
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 441 - 473
  • [2] Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models
    Aminmansour F.
    Jafferjee T.
    Imani E.
    Talvitie E.J.
    Bowling M.
    White M.
    Journal of Artificial Intelligence Research, 2024, 80 : 441 - 473
  • [3] Intelligent Trainer for Dyna-Style Model-Based Deep Reinforcement Learning
    Dong, Linsen
    Li, Yuanlong
    Zhou, Xin
    Wen, Yonggang
    Guan, Kyle
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) : 2758 - 2771
  • [4] TADS: Learning Time-Aware Scheduling Policy with Dyna-Style Planning for Spaced Repetition
    Yang, Zhengyu
    Shen, Jian
    Liu, Yunfei
    Yang, Yang
    Zhang, Weinan
    Yu, Yong
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1917 - 1920
  • [5] Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization
    Dong, Kun
    Luo, Yongle
    Wang, Yuxin
    Liu, Yu
    Qu, Chengeng
    Zhang, Qiang
    Cheng, Erkang
    Sun, Zhiyong
    Song, Bo
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [6] Physics-informed Dyna-style model-based deep reinforcement learning for dynamic control
    Liu, Xin-Yang
    Wang, Jian-Xun
    PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2021, 477 (2255):
  • [7] Capacity planning with limited information
    Anand, Vic
    Balakrishnan, Ramji
    Gavirneni, Srinagesh
    PRODUCTION AND OPERATIONS MANAGEMENT, 2023, 32 (09) : 2740 - 2757
  • [8] An EOQ model with limited storage capacity under trade credits
    Ouyang, Liang-Yuh
    Wu, Kun-Shan
    Yang, Chih-Te
    ASIA-PACIFIC JOURNAL OF OPERATIONAL RESEARCH, 2007, 24 (04) : 575 - 592
  • [9] Robust model predictive control under limited capacity communication constraints
    Savkovic, Borislav
    2009 IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1-3, 2009, : 1553 - 1558
  • [10] A stochastic model for operating room planning under capacity constraints
    Jebali, Aida
    Diabat, Ali
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2015, 53 (24) : 7252 - 7270