Experimental data-efficient reinforcement learning with an ensemble of surrogate models

被引:0
作者
Jiang, Jiazhou [1 ]
Chen, Zhiyong [1 ]
机构
[1] School of Engineering, University of Newcastle, Callaghan, 2308, NSW
关键词
Dynamics modeling; Ensemble model; Reinforcement learning; Surrogate model; Symbolic regression;
D O I
10.1016/j.neunet.2025.107870
中图分类号
学科分类号
摘要
Model-based reinforcement learning methods enhance sample efficiency by generating synthetic data during training. However, modeling errors can undermine training, leading to failures when applied to the actual environment, especially due to discrepancies in the learned dynamics. In this paper, we propose a new ensemble of double surrogate models created using symbolic regression to uncover the fundamental physical principles governing system behavior, thereby enabling more data-efficient real-world applications. Symbolic regression enhances model interpretability by producing straightforward models that generalize well with limited data. These models interact with reinforcement learning algorithms, with interactions occurring solely within the synthetic models, significantly reducing the need for real experimental data. The structure of the double surrogate models mitigates model bias, preventing agents from exploiting inaccuracies in the environment that could lead to poor performance. Our approach demonstrates comparable training performance when validated in real environments, requiring less than 1 % of the experimental data typically needed for conventional reinforcement learning algorithms. © 2025 The Author(s)
引用
收藏
相关论文
共 56 条
[1]  
Amirthayogam G., Kumaran N., Gopalakrishnan S., Brito K.R.A., RaviChand S., Choubey S.B., Integrating behavioral analytics and intrusion detection systems to protect critical infrastructure and smart cities, Babylonian Journal of Networking, 2024, pp. 88-97, (2024)
[2]  
Andrychowicz M., Wolski F., Ray A., Schneider J., Fong R., Welinder P., McGrew B., Tobin J., Pieter Abbeel O., Zaremba W., Hindsight experience replay, Advances in Neural Information Processing Systems, 30, (2017)
[3]  
Ashok D., Scott J., Wetzel S.J., Panju M., Ganesh V., Logic guided genetic algorithms (student abstract), Proceedings of the AAAI conference on artificial intelligence, 35, pp. 15753-15754, (2021)
[4]  
Brockman G., Cheung V., Pettersson L., Schneider J., Schulman J., Tang J., (2016)
[5]  
Chua K., Calandra R., McAllister R., Levine S., Deep reinforcement learning in a handful of trials using probabilistic dynamics models, Advances in Neural Information Processing Systems, 31, (2018)
[6]  
Clavera I., Rothfuss J., Schulman J., Fujita Y., Asfour T., Abbeel P., Model-based reinforcement learning via meta-policy optimization, Conference on robot learning, pp. 617-629, (2018)
[7]  
(2023)
[8]  
Deisenroth M., Rasmussen C.E., Pilco: A model-based and data-efficient approach to policy search, Proceedings of the 28th international conference on machine learning (ICML-11), pp. 465-472, (2011)
[9]  
Derner E., Kubalik J., Ancona N., Babuska R., Constructing parsimonious analytic models for dynamic systems via symbolic regression, Applied Soft Computing, 94, (2020)
[10]  
Feinberg V., Wan A., Stoica I., Jordan M.I., Gonzalez J.E., (2018)