Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing

被引：0

作者：

Wang, Kaixin ^{[1
]}

Zhou, Kuangqi ^{[1
]}

Zhang, Qixin ^{[2
]}

Shao, Jie ^{[3
]}

Hooi, Bryan ^{[1
]}

Feng, Jiashi ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] City Univ Hong Kong, Hong Kong, Peoples R China

[3] ByteDance AI Lab, Beijing, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward shaping. To approximate the Laplacian representation in large (or even continuous) state spaces, recent works propose to minimize a spectral graph drawing objective, which however has infinitely many global minimizers other than the eigenvectors. As a result, their learned Laplacian representation may differ from the ground truth. To solve this problem, we reformulate the graph drawing objective into a generalized form and derive a new learning objective, which is proved to have eigenvectors as its unique global minimizer. It enables learning high-quality Laplacian representations that faithfully approximate the ground truth. We validate this via comprehensive experiments on a set of gridworld and continuous control environments. Moreover, we show that our learned Laplacian representations lead to more exploratory options and better reward shaping.

引用

页数：10

共 28 条

[1]

Agarwal Rishabh, 2021, INT C LEARN REPR

[2]

[Anonymous], 2018, P 32 INT C NEUR INF

[3]

[Anonymous], 2014, Advances in Neural Information Processing Systems (NeurIPS)

[4]

BARRETO A, 2017, P NEUR INF PROC SYST

[5]

Chevalier-Boisvert M., 2018, Minimalistic Gridworld Environment for OpenAI Gym

[6]

Coumans E., 2016, Pybullet, a python module for physics simulation for robotics, games and machine learning, P2016

[7] IMPROVING GENERALIZATION FOR TEMPORAL DIFFERENCE LEARNING - THE SUCCESSOR REPRESENTATION [J].

DAYAN, P .

NEURAL COMPUTATION, 1993, 5 (04) :613-624

[8]

Dubey R., 2018, Proceedings of Machine Learning Research, P1349

[9]

Fan RK, 1997, Spectral graph theory. Number

[10]

Fubini G., 1907, INTEGRALI MULTIPLI N

← 1 2 3 →