共 14 条
[1]
[Anonymous], 1989, LEARNING DELAYED REW
[2]
[Anonymous], 1978, STOCHASTIC APPROXIMA
[3]
BARTO AG, 1991, COINS9157 U MASS TEC
[4]
BARTO AG, 1990, 1990 P CONN MOD SUMM
[5]
Bellman Richard, 1962, APPL DYNAMIC PROGRAM
[6]
CHAPMAN D, 1991, 1991 P INT JOINT C A, P726
[7]
Lin L. - J., 1992, MACHINE LEARNING, V8
[8]
MAHADEVAN, 1991, 1991 P NAT C AI, P768
[9]
Ross S.M., 2014, INTRO STOCHASTIC DYN
[10]
LEARNING CONTROL OF FINITE MARKOV-CHAINS WITH AN EXPLICIT TRADE-OFF BETWEEN ESTIMATION AND CONTROL
[J].
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS,
1988, 18 (05)
:677-684