ARLO: A framework for Automated Reinforcement Learning

被引:2
作者
Mussi, Marco [1 ]
Lombarda, Davide [2 ]
Metelli, Alberto Maria [1 ]
Trovo, Francesco [1 ]
Restelli, Marcello [1 ]
机构
[1] Politecn Milan, Milan, Italy
[2] ML Cube, Milan, Italy
关键词
AutoRL; Automated Reinforcement Learning;
D O I
10.1016/j.eswa.2023.119883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated Reinforcement Learning (AutoRL) is a relatively new area of research that is gaining increasing attention. The objective of AutoRL consists in easing the employment of Reinforcement Learning (RL) techniques for the broader public by alleviating some of its main challenges, including data collection, algorithm selection, and hyper-parameter tuning. In this work, we propose a general and flexible framework, namely ARLO: Automated Reinforcement Learning Optimizer, to construct automated pipelines for AutoRL. Based on this, we propose a pipeline for offline and one for online RL, discussing the components, interaction, and highlighting the difference between the two settings. Furthermore, we provide a Python implementation of such pipelines, released as an open-source library. Our implementation is tested on an illustrative LQG domain and on classic MuJoCo environments, showing the ability to reach competitive performances requiring limited human intervention. We also showcase the full pipeline on a realistic dam environment, automatically performing the feature selection and the model generation tasks.
引用
收藏
页数:12
相关论文
共 62 条
[1]  
Afshar R. R., 2022, ABS220105000 CORR
[2]   Optuna: A Next-generation Hyperparameter Optimization Framework [J].
Akiba, Takuya ;
Sano, Shotaro ;
Yanase, Toshihiko ;
Ohta, Takeru ;
Koyama, Masanori .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631
[3]  
[Anonymous], 2008, P NEUR INF PROC SYST
[4]  
[Anonymous], 2006, Pattern Recognition and Machine Learning, DOI [10.1117/1.2819119, DOI 10.18637/JSS.V017.B05]
[5]   Deep Reinforcement Learning A brief survey [J].
Arulkumaran, Kai ;
Deisenroth, Marc Peter ;
Brundage, Miles ;
Bharath, Anil Anthony .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38
[6]   Infinite-horizon policy-gradient estimation [J].
Baxter, J ;
Bartlett, PL .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 :319-350
[7]  
Beraha M, 2019, IEEE IJCNN
[8]  
Bisi L, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4583
[9]   Reinforcement learning for control: Performance, stability, and deep approximators [J].
Busoniu, Lucian ;
de Bruin, Tim ;
Tolic, Domagoj ;
Kober, Jens ;
Palunko, Ivana .
ANNUAL REVIEWS IN CONTROL, 2018, 46 :8-28
[10]  
Castelletti A., 2011, Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on, P62