Generalization and Computation for Policy Classes of Generative Adversarial Imitation Learning

被引:3
作者
Zhou, Yirui [1 ]
Zhang, Yangchun [1 ]
Liu, Xiaowei [1 ]
Wang, Wanying [1 ]
Che, Zhengping [2 ]
Xu, Zhiyuan [2 ]
Tang, Jian [2 ]
Peng, Yaxin [1 ]
机构
[1] Shanghai Univ, Sch Sci, Dept Math, Shanghai 200444, Peoples R China
[2] Midea Grp, AI Innovat Ctr, Shanghai 201702, Peoples R China
来源
PARALLEL PROBLEM SOLVING FROM NATURE - PPSN XVII, PPSN 2022, PT I | 2022年 / 13398卷
关键词
Generative adversarial imitation learning; Generalization; Computation; Policy classes;
D O I
10.1007/978-3-031-14714-2_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generative adversarial imitation learning (GAIL) learns an optimal policy by expert demonstrations from the environment with unknown reward functions. Different from existing works that studied the generalization of reward function classes or discriminator classes, we focus on policy classes. This paper investigates the generalization and computation for policy classes of GAIL. Specifically, our contributions lie in: 1) We prove that the generalization is guaranteed in GAIL when the complexity of policy classes is properly controlled. 2) We provide an off-policy framework called the two-stage stochastic gradient (TSSG), which can efficiently solve GAIL based on the soft policy iteration and attain the sublinear convergence rate to a stationary solution. The comprehensive numerical simulations are illustrated in MuJoCo environments.
引用
收藏
页码:385 / 399
页数:15
相关论文
共 39 条
[1]  
[Anonymous], 2010, P 13 INT C ART INT S
[2]  
[Anonymous], 2004, P 21 INT C MACH LEAR
[3]  
[Anonymous], 2014, Understanding Machine Learning: From Theory to Algorithms, DOI [DOI 10.1017/CBO9781107298019, 10.1017/CBO9781107298019]
[4]  
Arora S, 2019, 33 C NEURAL INFORM P, V32
[5]  
Arora S, 2017, PR MACH LEARN RES, V70
[6]  
Bach Francis, 2017, JOURNAL OF MACHINE LEARNING RESEARCH, V18
[7]  
Bain M., 1995, Machine Intelligence, V15, P103
[8]  
Bhattacharyya RP, 2018, IEEE INT C INT ROBOT, P1534, DOI 10.1109/IROS.2018.8593758
[9]  
Bietti Alberto, 2019, Advances in Neural Information Processing Systems, V32, P12873
[10]  
Chen M., 2020, INT C LEARNING REPRE