An Active Learning Framework for Efficient Robust Policy Search

被引:0
作者
Narayanaswami, Sai Kiran [1 ,3 ]
Sudarsanam, Nandan [2 ]
Ravindran, Balaraman [2 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Indian Inst Technol Madras, Robert Bosch Ctr Data Sci & AI, Madras, Tamil Nadu, India
[3] IIT Madras, Robert Bosch Ctr Data Sci & AI, Madras, Tamil Nadu, India
来源
PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022 | 2022年
关键词
deep reinforcement learning; robust learning; active learning; robotics;
D O I
10.1145/3493700.3493712
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework using Linear Bandits, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 34 条
[1]  
Rusu AA, 2016, Arxiv, DOI [arXiv:1606.04671, 10.48550/arXiv.1606.04671, DOI 10.43550/ARXIV:1606.04671, DOI 10.48550/ARXIV.1606.04671]
[2]  
Abbasi- Yadkori Yasin, 2011, P ADV NEUR INF PROC, V11, P2312
[3]   Linear Thompson sampling revisited [J].
Abeille, Marc ;
Lazaric, Alessandro .
ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02) :5165-5197
[4]  
Agrawal S., 2013, P 30 INT C INT C MAC, V28
[5]  
Antos A, 2008, LECT NOTES ARTIF INT, V5254, P287, DOI 10.1007/978-3-540-87987-9_25
[6]  
Brockman G, 2016, Arxiv, DOI arXiv:1606.01540
[7]   Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits [J].
Carpentier, Alexandra ;
Lazaric, Alessandro ;
Ghavamzadeh, Mohammad ;
Munos, Remi ;
Auer, Peter .
ALGORITHMIC LEARNING THEORY, 2011, 6925 :189-+
[8]  
Dhariwal Prafulla, 2017, Openai baselines
[9]  
Hiraoka Takuya, 2019, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, P2615
[10]  
Kurutach Thanard, 2018, ICLR