Reinforcement learning with immediate rewards and linear hypotheses

被引:44
|
作者
Abe, N
Biermann, AW
Long, PM
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA
[3] Genome Inst Singapore, Singapore 117528, Singapore
关键词
computational learning theory; reinforcement learning; immediate rewards; online learning; online algorithms; decision theory; dialogue systems;
D O I
10.1007/s00453-003-1038-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.
引用
收藏
页码:263 / 293
页数:31
相关论文
共 50 条
  • [21] Orientation-Preserving Rewards' Balancing in Reinforcement Learning
    Ren, Jinsheng
    Guo, Shangqi
    Chen, Feng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6458 - 6472
  • [22] Deep Reinforcement Learning For SPORADIC Rewards With HUMAN Experience
    Sinha, Harshit
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [23] Tentative Exploration on Reinforcement Learning Algorithms for Stochastic Rewards
    Pena, Luis
    LaTorre, Antonio
    Pena, Jose-Maria
    Ossowski, Sascha
    HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, 2009, 5572 : 336 - 343
  • [24] Off-Policy Reinforcement Learning with Delayed Rewards
    Han, Beining
    Ren, Zhizhou
    Wu, Zuofan
    Zhou, Yuan
    Peng, Jian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [25] Automatic Successive Reinforcement Learning with Multiple Auxiliary Rewards
    Fu, Zhao-Yang
    Zhan, De-Chuan
    Li, Xin-Chun
    Lu, Yi-Xing
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2336 - 2342
  • [26] Learning Circuit Placement Techniques through Reinforcement Learning with Adaptive Rewards
    Vassallo, Luke
    Bajada, Josef
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [27] Rewards Prediction-Based Credit Assignment for Reinforcement Learning With Sparse Binary Rewards
    Seo, Minah
    Vecchietti, Luiz Felipe
    Lee, Sangkeum
    Har, Dongsoo
    IEEE ACCESS, 2019, 7 : 118776 - 118791
  • [28] Split Q Learning: Reinforcement Learning with Two-Stream Rewards
    Lin, Baihan
    Bouneffouf, Djallel
    Cecchi, Guillermo
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6448 - 6449
  • [29] State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning With Rewards
    Calvo-Fullana, Miguel
    Paternain, Santiago
    Chamon, Luiz F. O.
    Ribeiro, Alejandro
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (07) : 4275 - 4290
  • [30] IMMEDIATE REINFORCEMENT IN DELAYED REWARD LEARNING IN PIGEONS
    WINTER, J
    PERKINS, CC
    JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 1982, 38 (02) : 169 - 179