共 4 条
- [2] On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning Journal of Optimization Theory and Applications, 2000, 105 : 589 - 608
- [3] Intentionally-underestimated value function at terminal state for temporal-difference learning with mis-designed reward RESULTS IN CONTROL AND OPTIMIZATION, 2025, 18
- [4] A temporal-difference learning method using gaussian state representation for continuous state space problems 1600, Japanese Society for Artificial Intelligence (29): : 157 - 167