Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

被引：0

作者：

Suh, H. J. Terry ^{[1
]}

Chou, Glen ^{[1
]}

Dai, Hongkai ^{[2
]}

Yang, Lujie ^{[1
]}

Gupta, Abhishek ^{[3
]}

Tedrake, Russ ^{[1
]}

机构：

[1] MIT, CSAIL, Cambridge, MA 02139 USA

[2] Toyota Res Inst, Los Altos, CA 94022 USA

[3] Univ Washington, Seattle, WA 98195 USA

来源：

CONFERENCE ON ROBOT LEARNING, VOL 229 | 2023年 / 229卷

关键词：

Diffusion; Score-Matching; Offline; Model-Based Reinforcement Learning; Imitation Learning; Planning under Uncertainty;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Gradient-based methods enable efficient search capabilities in high dimensions. However, in order to apply them effectively in offline optimization paradigms such as offline Reinforcement Learning (RL) or Imitation Learning (IL), we require a more careful consideration of how uncertainty estimation interplays with first-order methods that attempt to minimize them. We study smoothed distance to data as an uncertainty metric, and claim that it has two beneficial properties: (i) it allows gradient-based methods that attempt to minimize uncertainty to drive iterates to data as smoothing is annealed, and (ii) it facilitates analysis of model bias with Lipschitz constants. As distance to data can be expensive to compute online, we consider settings where we need amortize this computation. Instead of learning the distance however, we propose to learn its gradients directly as an oracle for first-order optimizers. We show these gradients can be efficiently learned with score-matching techniques by leveraging the equivalence between distance to data and data likelihood. Using this insight, we propose Score-Guided Planning (SGP), a planning algorithm for offline RL that utilizes score-matching to enable first-order planning in high-dimensional problems, where zeroth-order methods were unable to scale, and ensembles were unable to overcome local minima. Website: https://sites.google.com/view/score-guided-planning/home

引用

页数：27

共 78 条

[51] Paszke A, 2019, ADV NEUR IN, V32
[52] Permenter F., 2023, Interpreting and improving diffusion models using the euclidean distance function
[53] Peters J, 2010, AAAI CONF ARTIF INTE, P1607
[54] Pomerleau DA, 1988, Advances in neural information processing systems, P305
[55] Rezende DJ, 2015, PR MACH LEARN RES, V37, P1530
[56] MODEL PREDICTIVE HEURISTIC CONTROL - APPLICATIONS TO INDUSTRIAL PROCESSES
RICHALET, J
RAULT, A
TESTUD, JL
PAPON, J
[J]. AUTOMATICA, 1978, 14 (05) : 413 - 428
[57] High-Resolution Image Synthesis with Latent Diffusion Models
Rombach, Robin
Blattmann, Andreas
Lorenz, Dominik
Esser, Patrick
Ommer, Bjoern
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10674 - 10685
[58] U-Net: Convolutional Networks for Biomedical Image Segmentation
Ronneberger, Olaf
Fischer, Philipp
Brox, Thomas
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 : 234 - 241
[59] Ross S., 2011, A reduction of imitation learning and structured prediction to no-regret online learning, P627
[60] Rybkin O., 2021, Model-based reinforcement learning via latent-space collocation

← 1 2 3 4 5 6 7 8 →