Choice of discount rate in reinforcement learning with long-delay rewards

被引：0

作者：

LIN Xiangyang ^{[1
]}

XING Qinghua ^{[1
]}

LIU Fuxian ^{[1
]}

机构：

[1] Department of Air Defense and Anti-Missile, Air Force Engineering University

来源：

JournalofSystemsEngineeringandElectronics | 2022年 / 33卷 / 02期

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP181 [自动推理、机器学习];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-term rewards and are unwilling to make early-stage investments, so they hardly get the ultimate success and the corresponding high rewards. Similarly, for a reinforcement learning(RL) model with long-delay rewards, the discount rate determines the strength of agent’s “farsightedness”.In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the “farsightedness” requirement of agent. Afterwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions,a simple method is explored and verified by theoreti cal demonstration and mathematical experiments. Then, a series of RL experiments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is verified by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches.

引用

页码：381 / 392

页数：12

共 50 条

[31] CHOICE AND DELAY OF REINFORCEMENT
CHUNG, SH
HERRNSTEIN, RJ
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 1967, 10 (01) : 67 - +
[32] LONG-DELAY DISCRIMINATION PERFORMANCE AND ACQUISITION IN MONKEYS
DAMATO, MR
COX, JK
BULLETIN OF THE PSYCHONOMIC SOCIETY, 1976, 8 (04) : 265 - 265
[33] Reversal of long-delay conditioned taste aversion learning in rats by sex hormone manipulation
Foy, MR
Foy, JG
INTEGRATIVE PHYSIOLOGICAL AND BEHAVIORAL SCIENCE, 2003, 38 (03) : 203 - 213
[34] LEARNED-SAFETY AS A MECHANISM IN LONG-DELAY TASTE-AVERSION LEARNING IN RATS
KALAT, JW
ROZIN, P
JOURNAL OF COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1973, 83 (02): : 198 - 207
[35] Reinforcement Learning with Perturbed Rewards
Wang, Jingkang
Liu, Yang
Li, Bo
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6202 - 6209
[36] LONG-DELAY LEARNING IN THE PIGEON - FLAVOR, COLOR, AND FLAVOR-MEDIATED COLOR AVERSIONS
WESTBROOK, RF
CLARKE, JC
PROVOST, S
BEHAVIORAL AND NEURAL BIOLOGY, 1980, 28 (04): : 398 - 407
[37] D-cycloserine enhances short-delay, but not long-delay, conditioned taste aversion learning in rats
Davenport, Rachel A.
Houpt, Thomas A.
PHARMACOLOGY BIOCHEMISTRY AND BEHAVIOR, 2009, 91 (04) : 596 - 603
[38] SUPPRESSION OF LONG-DELAY MULTIPLE REFLECTIONS BY PREDICTIVE DECONVOLUTION
SINTON, JB
WARD, RW
WATKINS, JS
GEOPHYSICS, 1978, 43 (07) : 1352 - 1367
[39] Trace and long-delay fear conditioning in the developing rat
Barnet, RC
Hunt, PS
LEARNING & BEHAVIOR, 2005, 33 (04) : 437 - 443
[40] THE EFFECTS OF A FLAVOR-TOXICOSIS PAIRING UPON LONG-DELAY, FLAVOR AVERSION LEARNING
WESTBROOK, RF
HOMEWOOD, J
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION B-COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1982, 34 (MAY): : 59 - 75

← 1 2 3 4 5 →