共 9 条
- [1] XU T, LIANG Y, LAN G., Crpo: A new approach for safe reinforcement learning with convergence guarantee, International Conference on Machine Learning, pp. 11480-11491, (2021)
- [2] REDDY D S K, SAHA A, TAMILSELVAM S G, Et al., Risk averse reinforcement learning for mixed multi-agent environments, Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, pp. 2171-2173, (2019)
- [3] MANNOR S, SIMESTER D, SUN P, Et al., Bias and variance approximation in value function estimates, Management Science, 53, 2, pp. 308-322, (2007)
- [4] FUJIMOTO S, HOOF H, MEGER D., Addressing function approximation error in actor-critic methods, International Conference on Machine Learning, pp. 1587-1596, (2018)
- [5] VAN E F, NOBRE A C., Turning attention inside out: How working memory serves behavior, Annual Review of Psychology, 74, pp. 137-165, (2023)
- [6] IQBAL S, SHA F., Actor-attention-critic for multi-agent reinforcement learning, International Conference on Machine Learning, pp. 2961-2970, (2019)
- [7] PARNIKA P, DIDDIGI R B, DANDA S K R, Et al., Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning, (2021)
- [8] HAARNOJA T, ZHOU A, ABBEEL P, Et al., Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, International Conference on Machine Learning, pp. 1861-1870, (2018)
- [9] ZHOU M, LIU Z, SUI P, Et al., Learning implicit credit assignment for cooperative multi-agent reinforcement learning, Advances in Neural Information Processing Systems, 33, pp. 11853-11864, (2020)