Domain adaptation in reinforcement learning: a comprehensive and systematic study

被引:7
作者
Farhadi, Amirfarhad [1 ]
Mirzarezaee, Mitra [1 ]
Sharifi, Arash [1 ]
Teshnehlab, Mohammad [2 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Sci & Res Branch, Tehran 1477893855, Iran
[2] KN Toosi Univ Technol, Dept Control Engn, Tehran 1999143344, Iran
关键词
Reinforcement learning; Domain adaptation; Machine learning; (sic)(sic)(sic)(sic); TP391; CLASSIFICATION;
D O I
10.1631/FITEE.2300668
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RL) has shown significant potential for dealing with complex decision-making problems. However, its performance relies heavily on the availability of a large amount of high-quality data. In many real-world situations, data distribution in the target domain may differ significantly from that in the source domain, leading to a significant drop in the performance of RL algorithms. Domain adaptation (DA) strategies have been proposed to address this issue by transferring knowledge from a source domain to a target domain. However, there have been no comprehensive and in-depth studies to evaluate these approaches. In this paper we present a comprehensive and systematic study of DA in RL. We first introduce the basic concepts and formulations of DA in RL and then review the existing DA methods used in RL. Our main objective is to fill the existing literature gap regarding DA in RL. To achieve this, we conduct a rigorous evaluation of state-of-the-art DA approaches. We aim to provide comprehensive insights into DA in RL and contribute to advancing knowledge in this field. The existing DA approaches are divided into seven categories based on application domains. The approaches in each category are discussed based on the important data adaptation metrics, and then their key characteristics are described. Finally, challenging issues and future research trends are highlighted to assist researchers in developing innovative improvements. (sic)(sic)(sic)(sic)(RL)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic)(sic)(sic)(DA)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)7(sic). (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic). (sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic), (sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic)(sic).
引用
收藏
页码:1446 / 1465
页数:20
相关论文
共 71 条
[1]  
Bagheri M., 2021, Clustering Individual Entities Based on Common Features
[2]   Semi-Supervised Segmentation of Multi-vendor and Multi-center Cardiac MRI [J].
Bolhassani, Mahyar ;
Oksuz, Ilkay .
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[3]   Deep reinforcement learning for inventory control: A roadmap [J].
Boute, Robert N. ;
Gijsbrechts, Joren ;
van Jaarsveld, Willem ;
Vanvuchelen, Nathalie .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 298 (02) :401-412
[4]   A smart agriculture IoT system based on deep reinforcement learning [J].
Bu, Fanyu ;
Wang, Xin .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 99 :500-507
[5]  
Carr T, 2019, AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, P1859
[6]   Reading Wikipedia to Answer Open-Domain Questions [J].
Chen, Danqi ;
Fisch, Adam ;
Weston, Jason ;
Bordes, Antoine .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1870-1879
[7]   Domain Adversarial Reinforcement Learning for Partial Domain Adaptation [J].
Chen, Jin ;
Wu, Xinxiao ;
Duan, Lixin ;
Gao, Shenghua .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) :539-553
[8]  
Chen XH., 2021, 35 TH C NEURAL INFOR, V12520, P12532
[9]  
Chu CH, 2020, Journal of Information Processing, V28, P413, DOI [10.2197/ipsjjip.28.413, DOI 10.2197/IPSJJIP.28.413]
[10]  
Dhingra BhuwanKathryn Mazaitis William W Cohen., 2017, Quasar: Datasets for question answering by search and reading