A Coalitional Markov Decision Process Model for Dynamic Coalition Formation among Agents

被引：2

作者：

Ding, Shiyao ^{[1
]}

Lin, Donghui ^{[1
]}

机构：

[1] Kyoto Univ, Dept Social Informat, Kyoto, Japan

来源：

2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020) | 2020年

基金：

日本学术振兴会;

关键词：

Multi-agent systems; Coalition structure generation; Reinforcement learning; MDP; Q-learning; STRUCTURE GENERATION; EDGE CLOUD; INTERNET;

D O I：

10.1109/WIIAT50758.2020.00044

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In multi-agent field, most studies on coalition formation problems assume static environments, but in real-world scenarios coalition formation problems can occur in dynamic environments. This creates the dynamic coalition formation problem where the coalition structure might change with time; our response is to propose a coalitional Markov decision process (CMDP). In CMDP, the dynamic process is modeled as a MDP where the agents observe the current state to formulate some coalitions and each coalition represents a unit that can take action to impact the environment. The environment probabilistically transfers to the next state to repeat the process. However, changing the coalition structure in the midst of MDP transitions incurs a cost that hinders the use of the classical algorithms invoked to solve MDP (e.g. Q-learning) to solve CMDP. Thus, we propose a novel algorithm coalitional Q-learning to solve CMDP and prove it can guarantee the convergence on optimal policies in CMDP; Furthermore, we apply the proposed algorithm to a dynamic formation problem in edge computing to guide edge servers to cooperatively perform tasks and thus verify the algorithm's effectiveness.

引用

页码：308 / 315

页数：8

共 31 条

[1]

[Anonymous], 2015, Reinforcement Learning: An Introduction

[2]

[Anonymous], 2003, P DIMACS WORKSH NETW

[3] Dynamic coalition formation and the core [J].

Arnold, T ;

Schwalbe, U .

JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION, 2002, 49 (03) :363-380

[4] Cooperative Markov decision processes: time consistency, greedy players satisfaction, and cooperation maintenance [J].

Avrachenkov, Konstantin ;

Cottatellucci, Laura ;

Maggi, Lorenzo .

INTERNATIONAL JOURNAL OF GAME THEORY, 2013, 42 (01) :239-262

[5]

Busoniu L, 2010, STUD COMPUT INTELL, V310, P183

[6]

Chalkiadakis G., 2007, P 6 INT JOINT C AUT, P1, DOI DOI 10.1145/1329125.1329203

[7]

Chalkiadakis G., 2004, AAMAS, V3, P1090

[8]

Chang H, 2014, IEEE CONF COMPUT, P346, DOI 10.1109/INFCOMW.2014.6849256

[9]

Flammini M, 2018, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), P1353

[10]

Gao Y., 2000, J COMPUTER RES DEV, V3

← 1 2 3 4 →