共 14 条
- [1] [Anonymous], 1989, LEARNING DELAYED REW
- [2] [Anonymous], 1978, STOCHASTIC APPROXIMA
- [3] BARTO AG, 1991, COINS9157 U MASS TEC
- [4] BARTO AG, 1990, 1990 P CONN MOD SUMM
- [5] Bellman Richard, 1962, APPL DYNAMIC PROGRAM
- [6] CHAPMAN D, 1991, 1991 P INT JOINT C A, P726
- [7] Lin L. - J., 1992, MACHINE LEARNING, V8
- [8] MAHADEVAN, 1991, 1991 P NAT C AI, P768
- [9] Ross S.M., 2014, INTRO STOCHASTIC DYN
- [10] LEARNING CONTROL OF FINITE MARKOV-CHAINS WITH AN EXPLICIT TRADE-OFF BETWEEN ESTIMATION AND CONTROL [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1988, 18 (05): : 677 - 684