Contrastive Intrinsic Control for Unsupervised Reinforcement Learning

被引:0
作者
Laskin, Michael [1 ]
Liu, Hao [1 ]
Peng, Xue Bin [1 ]
Yarats, Denis [2 ,3 ]
Rajeswaran, Aravind [1 ,3 ]
Abbeel, Pieter [1 ,4 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] NYU, New York, NY USA
[3] Meta AI, New York, NY USA
[4] Covariant, Berkeley, CA USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Contrastive Intrinsic Control (CIC), an unsupervised reinforcement learning (RL) algorithm that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills vectors to learn behaviour embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioural diversity. We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pretraining phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB. (1)
引用
收藏
页数:14
相关论文
共 45 条
[1]  
Achiam J., 2018, ARXIV180710299
[2]  
Agarwal Rishabh., 2021, Deep reinforcement learning at the edge of the statistical precipice
[3]  
[Anonymous], 2018, INT C MACH LEARN
[4]  
Barber D., 2003, NIPS
[5]  
Barreto A., 2016, ARXIV160605312
[6]  
Beirlant Jan., 1997, INT J MATH STAT SCI, V6, P17
[7]  
Brockman G., 2016, OPENAI GYM, P1
[8]  
Burda Y., 2019, INT C LEARN REPR ICL
[9]  
Caron M, 2020, ADV NEUR IN, V33
[10]  
Chen Ting, 2020, INT C MACHINE LEARNI