Bayesian Model-Based Offline Reinforcement Learning for Product Allocation

被引：0

作者：

Jenkins, Porter ^{[1
]}

Wei, Hua ^{[2
]}

Jenkins, J. Stockton ^{[1
]}

Li, Zhenhui ^{[3
]}

机构：

[1] Brigham Young Univ, Provo, UT 84602 USA

[2] New Jersey Inst Technol, Newark, NJ 07102 USA

[3] Penn State Univ, University Pk, PA 16802 USA

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Product allocation in retail is the process of placing products throughout a store to connect consumers with relevant products. Discovering a good allocation strategy is challenging due to the scarcity of data and the high cost of experimentation in the physical world. Some work explores Reinforcement learning (RL) as a solution, but these approaches are often limited because of the sim2real problem. Learning policies from logged trajectories of a system is a key step forward for RL in physical systems. Recent work has shown that model-based offline RL can improve the effectiveness of offline policy estimation through uncertainty-penalized exploration. However, existing work assumes a continuous state space and access to a covariance matrix of the environment dynamics, which is not possible in the discrete case. To solve this problem, we propose a Bayesian model-based technique that naturally produces probabilistic estimates of the environment dynamics via the posterior predictive distribution, which we use for uncertainty-penalized exploration. We call our approach Posterior Penalized Offline Policy Optimization (PPOPO). We show that our world model better fits historical data due to informative priors, and that PPOPO outperforms other offline techniques in simulation and against real-world data.

引用

页码：12531 / 12537

页数：7

共 50 条

[1] MOReL: Model-Based Offline Reinforcement Learning
Kidambi, Rahul
Rajeswaran, Aravind
Netrapalli, Praneeth
Joachims, Thorsten
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[2] Offline Reinforcement Learning with Reverse Model-based Imagination
Wang, Jianhao
Li, Wenzhe
Jiang, Haozhe
Zhu, Guangxiang
Li, Siyuan
Zhang, Chongjie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] Offline Model-Based Reinforcement Learning for Tokamak Control
Char, Ian
Abbate, Joseph
Bardoczi, Laszlo
Boyer, Mark D.
Chung, Youngseog
Conlin, Rory
Erickson, Keith
Mehta, Viraj
Richner, Nathan
Kolemen, Egemen
Schneider, Jeff
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[4] Model-Based Offline Reinforcement Learning with Local Misspecification
Dong, Kefan
Flet-Berliac, Yannis
Nie, Allen
Brunskill, Emma
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7423 - 7431
[5] Weighted model estimation for offline model-based reinforcement learning
Hishinuma, Toru
Senda, Kei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[6] Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
Hishinuma, Toru
Senda, Kei
IEEE ACCESS, 2023, 11 : 145579 - 145590
[7] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
Li, Hao
Zhou, Xiao-Hu
Xie, Xiao-Liang
Liu, Shi-Qi
Feng, Zhen-Qiu
Gui, Mei-Jiang
Xiang, Tian-Yu
Huang, De-Xing
Hou, Zeng-Guang
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (03): : 1054 - 1062
[8] SETTLING THE SAMPLE COMPLEXITY OF MODEL-BASED OFFLINE REINFORCEMENT LEARNING
Li, Gen
Shi, Laixi
Chen, Yuxin
Chi, Yuejie
Wei, Yuting
ANNALS OF STATISTICS, 2024, 52 (01): : 233 - 260
[9] Model-based offline reinforcement learning for sustainable fishery management
Ju, Jun
Kurniawati, Hanna
Kroese, Dirk
Ye, Nan
EXPERT SYSTEMS, 2025, 42 (01)
[10] Model-based Bayesian Reinforcement Learning for Dialogue Management
Lison, Pierre
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 475 - 479

← 1 2 3 4 5 →