Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

被引:0
作者
Amarildo Likmeta
Alberto Maria Metelli
Giorgia Ramponi
Andrea Tirinzoni
Matteo Giuliani
Marcello Restelli
机构
[1] Politecnico di Milano,
[2] Università di Bologna,undefined
来源
Machine Learning | 2021年 / 110卷
关键词
Inverse reinforcement learning; Model-free IRL; Truly batch IRL; IRL for real life; Multiple experts IRL; Non-stationary IRL;
D O I
暂无
中图分类号
学科分类号
摘要
In real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.
引用
收藏
页码:2541 / 2576
页数:35
相关论文
共 89 条
[1]  
Abbeel P(2010)Autonomous helicopter aerobatics through apprenticeship learning The International Journal of Robotics Research 29 1608-1639
[2]  
Coates A(2017)A survey of methods for time series change point detection Knowledge and Information Systems 51 339-367
[3]  
Ng AY(2009)A survey of robot learning from demonstration Robotics and Autonomous Systems 57 469-483
[4]  
Aminikhanghahi S(2001)Infinite-horizon policy-gradient estimation Journal of Artificial Intelligence Research 15 319-350
[5]  
Cook DJ(1958)On a routing problem Quarterly of applied mathematics 16 87-90
[6]  
Argall BD(2019)Deep hedging Quantitative Finance 19 1271-1291
[7]  
Chernova S(2013)A survey on policy search for robotics Foundations and Trends in Robotics 2 1-142
[8]  
Veloso M(2013)Probabilistic model-based imitation learning Adaptive Behavior 21 388-403
[9]  
Browning B(2019)Detecting the state of the climate system via artificial intelligence to improve seasonal forecasts and inform reservoir operations Water Resources Research 55 9133-9147
[10]  
Baxter J(2017)Imitation learning: A survey of learning methods ACM Computing Surveys (CSUR) 50 1-35