共 332 条
- [1] Agogino AK(2008)Analyzing and visualizing multiagent rewards in dynamic and stochastic domains Autonomous Agents and Multi-Agent Systems 17 320-338
- [2] Tumer K(2006)Adaptive importance sampling technique for markov chains using stochastic approximation Operations Research 54 489-504
- [3] Ahamed TI(2018)Autonomous agents modelling other agents: A comprehensive survey and open problems Artificial Intelligence 258 66-95
- [4] Borkar VS(2002)Learning in multi-agent systems Knowledge Engineering Review 16 1-8
- [5] Juneja S(1965)Optimal control of Markov processes with incomplete state information Journal of Mathematical Analysis and Applications 10 174-205
- [6] Albrecht SV(1981)The evolution of cooperation Science 211 1390-1396
- [7] Stone P(1995)Residual algorithms: Reinforcement learning with function approximation Machine Learning Proceedings 1995 30-37
- [8] Alonso E(2004)Solving transition independent decentralized Markov decision processes Journal of Artificial Intelligence Research 22 423-455
- [9] D’inverno M(2013)The arcade learning environment: An evaluation platform for general agents Journal of Artificial Intelligence Research 47 253-279
- [10] Kudenko D(1957)A Markovian decision process Journal of Mathematics and Mechanics 6 679-684