Contrastive Learning Methods for Deep Reinforcement Learning

被引:3
作者
Wang, Di [1 ]
Hu, Mengqi [1 ]
机构
[1] Univ Illinois, Dept Mech & Ind Engn, Chicago, IL 60609 USA
基金
美国国家科学基金会;
关键词
Contrastive learning; deep reinforcement learning; different-age experience; experience replay buffer; parallel learning; BUFFER;
D O I
10.1109/ACCESS.2023.3312383
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep reinforcement learning (DRL) has shown promising performance in various application areas (e.g., games and autonomous vehicles). Experience replay buffer strategy and parallel learning strategy are widely used to boost the performances of offline and online deep reinforcement learning algorithms. However, state-action distribution shifts lead to bootstrap errors. Experience replay buffer learns policies with elder experience trajectories, limiting its application to off-policy algorithms. Balancing the new and the old experience is challenging. Parallel learning strategies can train policies with online experiences. However, parallel environmental instances organize the agent pool inefficiently with higher simulation or physical costs. To overcome these shortcomings, we develop four lightweight and effective DRL algorithms, instance-actor, parallel-actor, instance-critic, and parallel-critic methods, to contrast different-age trajectory experiences. We train the contrast DRL according to the received rewards and proposed contrast loss, which is calculated by designed positive/negative keys. Our benchmark experiments using PyBullet robotics environments show that our proposed algorithm matches or is better than the state-of-the-art DRL algorithms.
引用
收藏
页码:97107 / 97117
页数:11
相关论文
共 51 条
[1]   Single-Image Reflection Removal Using Deep Learning: A Systematic Review [J].
Amanlou, Ali ;
Suratgar, Amir Abolfazl ;
Tavoosi, Jafar ;
Mohammadzadeh, Ardashir ;
Mosavi, Amir .
IEEE ACCESS, 2022, 10 :29937-29953
[2]  
Barth-Maron G, 2018, Arxiv, DOI arXiv:1804.08617
[3]   Learning visual similarity for product design with convolutional neural networks [J].
Bell, Sean ;
Bala, Kavita .
ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (04)
[4]   Reinforcement Learning-Based Cascade Motion Policy Design for Robust 3D Bipedal Locomotion [J].
Castillo, Guillermo A. ;
Weng, Bowen ;
Zhang, Wei ;
Hereid, Ayonga .
IEEE ACCESS, 2022, 10 :20135-20148
[5]   Learning a similarity metric discriminatively, with application to face verification [J].
Chopra, S ;
Hadsell, R ;
LeCun, Y .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :539-546
[6]  
Clemente AV, 2017, Arxiv, DOI arXiv:1705.04862
[7]  
Fujimoto S, 2018, Arxiv, DOI [arXiv:1802.09477, 10.48550/arXiv.1802.09477]
[8]  
Grounds M., 2005, Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning, P60
[9]  
Gutmann M., 2010, AISTATS, P297
[10]  
Haarnoja T, 2018, Arxiv, DOI [arXiv:1801.01290, 10.48550/ARXIV.1801.01290]