Bayesian Model-Based Offline Reinforcement Learning for Product Allocation

被引:0
|
作者
Jenkins, Porter [1 ]
Wei, Hua [2 ]
Jenkins, J. Stockton [1 ]
Li, Zhenhui [3 ]
机构
[1] Brigham Young Univ, Provo, UT 84602 USA
[2] New Jersey Inst Technol, Newark, NJ 07102 USA
[3] Penn State Univ, University Pk, PA 16802 USA
来源
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Product allocation in retail is the process of placing products throughout a store to connect consumers with relevant products. Discovering a good allocation strategy is challenging due to the scarcity of data and the high cost of experimentation in the physical world. Some work explores Reinforcement learning (RL) as a solution, but these approaches are often limited because of the sim2real problem. Learning policies from logged trajectories of a system is a key step forward for RL in physical systems. Recent work has shown that model-based offline RL can improve the effectiveness of offline policy estimation through uncertainty-penalized exploration. However, existing work assumes a continuous state space and access to a covariance matrix of the environment dynamics, which is not possible in the discrete case. To solve this problem, we propose a Bayesian model-based technique that naturally produces probabilistic estimates of the environment dynamics via the posterior predictive distribution, which we use for uncertainty-penalized exploration. We call our approach Posterior Penalized Offline Policy Optimization (PPOPO). We show that our world model better fits historical data due to informative priors, and that PPOPO outperforms other offline techniques in simulation and against real-world data.
引用
收藏
页码:12531 / 12537
页数:7
相关论文
共 50 条
  • [1] MOReL: Model-Based Offline Reinforcement Learning
    Kidambi, Rahul
    Rajeswaran, Aravind
    Netrapalli, Praneeth
    Joachims, Thorsten
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] Offline Reinforcement Learning with Reverse Model-based Imagination
    Wang, Jianhao
    Li, Wenzhe
    Jiang, Haozhe
    Zhu, Guangxiang
    Li, Siyuan
    Zhang, Chongjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Offline Model-Based Reinforcement Learning for Tokamak Control
    Char, Ian
    Abbate, Joseph
    Bardoczi, Laszlo
    Boyer, Mark D.
    Chung, Youngseog
    Conlin, Rory
    Erickson, Keith
    Mehta, Viraj
    Richner, Nathan
    Kolemen, Egemen
    Schneider, Jeff
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [4] Model-Based Offline Reinforcement Learning with Local Misspecification
    Dong, Kefan
    Flet-Berliac, Yannis
    Nie, Allen
    Brunskill, Emma
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7423 - 7431
  • [5] Weighted model estimation for offline model-based reinforcement learning
    Hishinuma, Toru
    Senda, Kei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [6] Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
    Hishinuma, Toru
    Senda, Kei
    IEEE ACCESS, 2023, 11 : 145579 - 145590
  • [7] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
    Li, Hao
    Zhou, Xiao-Hu
    Xie, Xiao-Liang
    Liu, Shi-Qi
    Feng, Zhen-Qiu
    Gui, Mei-Jiang
    Xiang, Tian-Yu
    Huang, De-Xing
    Hou, Zeng-Guang
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (03): : 1054 - 1062
  • [8] SETTLING THE SAMPLE COMPLEXITY OF MODEL-BASED OFFLINE REINFORCEMENT LEARNING
    Li, Gen
    Shi, Laixi
    Chen, Yuxin
    Chi, Yuejie
    Wei, Yuting
    ANNALS OF STATISTICS, 2024, 52 (01): : 233 - 260
  • [9] Model-based offline reinforcement learning for sustainable fishery management
    Ju, Jun
    Kurniawati, Hanna
    Kroese, Dirk
    Ye, Nan
    EXPERT SYSTEMS, 2025, 42 (01)
  • [10] Model-based Bayesian Reinforcement Learning for Dialogue Management
    Lison, Pierre
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 475 - 479