A DATASET PERSPECTIVE ON OFFLINE REINFORCEMENT LEARNING

被引:0
|
作者
Schweighofer, Kajetan [1 ,2 ]
Radler, Andreas [1 ,2 ]
Dinu, Marius-Constantin [1 ,2 ,4 ]
Hofmarcher, Markus [1 ,2 ]
Patil, Vihang [1 ,2 ]
Bitto-Nemling, Angela [1 ,2 ,3 ]
Eghbal-zadeh, Hamid [1 ,2 ,3 ]
Hochreiter, Sepp [1 ,2 ]
机构
[1] Johannes Kepler Univ Linz, ELLIS Unit Linz, Inst Machine Learning, Linz, Austria
[2] Johannes Kepler Univ Linz, Inst Machine Learning, LIT AI Lab, Linz, Austria
[3] IARAI, Vienna, Austria
[4] Dynatrace Res, Linz, Austria
来源
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199 | 2022年 / 199卷
基金
欧盟地平线“2020”;
关键词
CONCEPT DRIFT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The application of Reinforcement Learning (RL) in real world environments can be expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is avoided since interactions with an environment are prohibited. Policies are learned from a given dataset, which solely determines their performance. Despite this fact, how dataset characteristics influence Offline RL algorithms is still hardly investigated. The dataset characteristics are determined by the behavioral policy that samples this dataset. Therefore, we define characteristics of behavioral policies as exploratory for yielding high expected information in their interaction with the Markov Decision Process (MDP) and as exploitative for having high expected return. We implement two corresponding empirical measures for the datasets sampled by the behavioral policy in deterministic MDPs. The first empirical measure SACo is defined by the normalized unique state-action pairs and captures exploration. The second empirical measure TQ is defined by the normalized average trajectory return and captures exploitation. Empirical evaluations show the effectiveness of TQ and SACo. In large-scale experiments using our proposed measures, we show that the unconstrained off-policy Deep Q-Network family requires datasets with high SACo to find a good policy. Furthermore, experiments show that policy constraint algorithms perform well on datasets with high TQ and SACo. Finally, the experiments show, that purely dataset-constrained Behavioral Cloning performs competitively to the best Offline RL algorithms for datasets with high TQ. [GRAPHICS] .
引用
收藏
页数:48
相关论文
共 13 条
  • [1] Input addition and deletion in reinforcement: towards protean learning
    Bonnici, Iago
    Gouaich, Abdelkader
    Michel, Fabien
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2022, 36 (01)
  • [2] A Stitch in Time - Autonomous Model Management via Reinforcement Learning
    Liebman, Elad
    Zavesky, Eric
    Stone, Peter
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 990 - 998
  • [3] Adaptive and Reinforcement Learning Approaches for Online Network Monitoring and Analysis
    Wassermann, Sarah
    Cuvelier, Thibaut
    Mulinka, Pavol
    Casas, Pedro
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (02): : 1832 - 1849
  • [4] Reinforcement Online Active Learning Ensemble for Drifting Imbalanced Data Streams
    Zhang, Hang
    Liu, Weike
    Liu, Qingbao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (08) : 3971 - 3983
  • [5] Reinforcement learning applied to a situation awareness decision-making model
    Costa, Renato D.
    Hirata, Celso M.
    INFORMATION SCIENCES, 2025, 704
  • [6] Online Ensemble Aggregation using Deep Reinforcement Learning for Time Series Forecasting
    Saadallah, Amal
    Morik, Katharina
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [7] Reinforcement Learning-Based Streaming Process Discovery Under Concept Drift
    Cai, Rujian
    Zheng, Chao
    Wang, Jian
    Li, Duantengchuan
    Wang, Chong
    Li, Bing
    ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2024, 2024, 14663 : 55 - 70
  • [8] Concept Drift Detection and Adaption in Big Imbalance Industrial IoT Data Using an Ensemble Learning Method of Offline Classifiers
    Lin, Chun-Cheng
    Deng, Der-Jiunn
    Kuo, Chin-Hung
    Chen, Linnan
    IEEE ACCESS, 2019, 7 : 56198 - 56207
  • [9] Automated Concept Drift Handling for Fault Prediction in Edge Clouds Using Reinforcement Learning
    Shayesteh, Behshid
    Fu, Chunyan
    Ebrahimzadeh, Amin
    Glitho, Roch H.
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (02): : 1321 - 1335
  • [10] EMRIL: Ensemble Method based on ReInforcement Learning for binary classification in imbalanced drifting data streams
    Usman, Muhammad
    Chen, Huanhuan
    NEUROCOMPUTING, 2024, 605