Apprenticeship Learning via Frank-Wolfe

被引：0

作者：

Zahavy, Tom ^{[1
]}

Cohen, Alon ^{[1
]}

Kaplan, Haim ^{[1
]}

Mansour, Yishay ^{[1
]}

机构：

[1] Google Res, Tel Aviv, Israel

来源：

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷

关键词：

CONVERGENCE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In this setting, we are given a Markov Decision Process (MDP) without an explicit reward function. Instead, we observe an expert that acts according to some policy, and the goal is to find a policy whose feature expectations are closest to those of the expert policy. We formulate this problem as finding the projection of the feature expectations of the expert on the feature expectations polytope - the convex hull of the feature expectations of all the deterministic policies in the MDP. We show that this formulation is equivalent to the AL objective and that solving this problem using the FW algorithm is equivalent well-known Projection method of Abbeel and Ng (2004). This insight allows us to analyze AL with tools from convex optimization literature and derive tighter convergence bounds on AL. Specifically, we show that a variation of the FW method that is based on taking "away steps" achieves a linear rate of convergence when applied to AL and that a stochastic version of the FW algorithm can be used to avoid precise estimation of feature expectations. We also experimentally show that this version outperforms the FW baseline. To the best of our knowledge, this is the first work that shows linear convergence rates for AL.

引用

页码：6720 / 6728

页数：9

共 27 条

[1]

Abbeel P., 2004, Apprenticeship learning via inverse reinforcement learning. pages, P1, DOI [DOI 10.1145/1015330.1015430, 10.1145/1015330.1015430]

[2]

[Anonymous], 2014, NIPS 2013 WORKSH GRE

[3]

[Anonymous], 2013, Revisiting frank-wolfe: Projection-free sparse convex optimization

[4] A conditional gradient method with linear rate of convergence for solving convex linear systems [J].

Beck, A ;

Teboulle, M .

MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2004, 59 (02) :235-247

[5] Linearly convergent away-step conditional gradient for non-strongly convex functions [J].

Beck, Amir ;

Shtern, Shimrit .

MATHEMATICAL PROGRAMMING, 2017, 164 (1-2) :1-27

[6] A TIGHT UPPER BOUND ON RATE OF CONVERGENCE OF FRANK-WOLFE ALGORITHM [J].

CANON, MD ;

CULLUM, CD .

SIAM JOURNAL ON CONTROL, 1968, 6 (04) :509-&

[7]

Even-Dar E, 2003, J MACH LEARN RES, V5, P1

[8]

Frank M., 1956, Naval Research Logistics Quarterly, V3, P95, DOI [DOI 10.1002/NAV.3800030109, 10.1002/nav.3800030109]

[9]

Garber D., 2015, ICML

[10]

Garber D., 2013, ARXIV13014666

← 1 2 3 →