共 332 条
[1]
Agogino AK(2008)Analyzing and visualizing multiagent rewards in dynamic and stochastic domains Autonomous Agents and Multi-Agent Systems 17 320-338
[2]
Tumer K(2006)Adaptive importance sampling technique for markov chains using stochastic approximation Operations Research 54 489-504
[3]
Ahamed TI(2018)Autonomous agents modelling other agents: A comprehensive survey and open problems Artificial Intelligence 258 66-95
[4]
Borkar VS(2002)Learning in multi-agent systems Knowledge Engineering Review 16 1-8
[5]
Juneja S(1965)Optimal control of Markov processes with incomplete state information Journal of Mathematical Analysis and Applications 10 174-205
[6]
Albrecht SV(1981)The evolution of cooperation Science 211 1390-1396
[7]
Stone P(1995)Residual algorithms: Reinforcement learning with function approximation Machine Learning Proceedings 1995 30-37
[8]
Alonso E(2004)Solving transition independent decentralized Markov decision processes Journal of Artificial Intelligence Research 22 423-455
[9]
D’inverno M(2013)The arcade learning environment: An evaluation platform for general agents Journal of Artificial Intelligence Research 47 253-279
[10]
Kudenko D(1957)A Markovian decision process Journal of Mathematics and Mechanics 6 679-684