Reinforcement learning with immediate rewards and linear hypotheses

被引：44

作者：

Abe, N

Biermann, AW

Long, PM

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA

[3] Genome Inst Singapore, Singapore 117528, Singapore

来源：

ALGORITHMICA | 2003年 / 37卷 / 04期

关键词：

computational learning theory; reinforcement learning; immediate rewards; online learning; online algorithms; decision theory; dialogue systems;

D O I：

10.1007/s00453-003-1038-1

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.

引用

页码：263 / 293

页数：31

共 50 条

[21] Orientation-Preserving Rewards' Balancing in Reinforcement Learning
Ren, Jinsheng
Guo, Shangqi
Chen, Feng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6458 - 6472
[22] Deep Reinforcement Learning For SPORADIC Rewards With HUMAN Experience
Sinha, Harshit
PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
[23] Tentative Exploration on Reinforcement Learning Algorithms for Stochastic Rewards
Pena, Luis
LaTorre, Antonio
Pena, Jose-Maria
Ossowski, Sascha
HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, 2009, 5572 : 336 - 343
[24] Off-Policy Reinforcement Learning with Delayed Rewards
Han, Beining
Ren, Zhizhou
Wu, Zuofan
Zhou, Yuan
Peng, Jian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[25] Automatic Successive Reinforcement Learning with Multiple Auxiliary Rewards
Fu, Zhao-Yang
Zhan, De-Chuan
Li, Xin-Chun
Lu, Yi-Xing
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2336 - 2342
[26] Learning Circuit Placement Techniques through Reinforcement Learning with Adaptive Rewards
Vassallo, Luke
Bajada, Josef
2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
[27] Rewards Prediction-Based Credit Assignment for Reinforcement Learning With Sparse Binary Rewards
Seo, Minah
Vecchietti, Luiz Felipe
Lee, Sangkeum
Har, Dongsoo
IEEE ACCESS, 2019, 7 : 118776 - 118791
[28] Split Q Learning: Reinforcement Learning with Two-Stream Rewards
Lin, Baihan
Bouneffouf, Djallel
Cecchi, Guillermo
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6448 - 6449
[29] State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning With Rewards
Calvo-Fullana, Miguel
Paternain, Santiago
Chamon, Luiz F. O.
Ribeiro, Alejandro
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (07) : 4275 - 4290
[30] IMMEDIATE REINFORCEMENT IN DELAYED REWARD LEARNING IN PIGEONS
WINTER, J
PERKINS, CC
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 1982, 38 (02) : 169 - 179

← 1 2 3 4 5 →