Risk-sensitive REINFORCE: A Monte Carlo Policy Gradient Algorithm for Exponential Performance Criteria

被引：3

作者：

Noorani, Erfaun ^{[1
,2
,3
]}

Baras, John S. ^{[1
,2
]}

机构：

[1] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

[2] Univ Maryland, Inst Syst Res ISR, College Pk, MD 20742 USA

[3] Clark Sch Engn, College Pk, MD 20742 USA

来源：

2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC) | 2021年

关键词：

STOCHASTIC LINEAR-SYSTEMS; GAMES;

D O I：

10.1109/CDC45484.2021.9683645

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Risk is an inherent component of any decision making process under uncertain conditions, and failure to consider risk may lead to significant performance degradation. We present a policy gradient theorem for the Risk-sensitive Control "exponential of integral" criteria, and propose a risk-sensitive Monte Carlo policy gradient algorithm. Our simulations, together with our theoretical analysis, show that the use of the exponential criteria with an appropriately chosen risk parameter not only results in a risk-sensitive policy, but also reduces variance during learning process and accelerates learning, which in turn results in a policy with higher expected return- that is to say, risk-sensitiveness leads to sample efficiency and improved performance.

引用

页码：1522 / 1527

页数：6