Neural Regret-Matching for Distributed Constraint Optimization Problems

被引：0

作者：

Deng, Yanchen ^{[1
]}

Yu, Runshen ^{[1
]}

Wang, Xinrun ^{[1
]}

An, Bo ^{[1
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

来源：

PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年

基金：

新加坡国家研究基金会;

关键词：

ALGORITHMS; ADOPT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distributed constraint optimization problems (DCOPs) are a powerful model for multi-agent coordination and optimization, where information and controls are distributed among multiple agents by nature. However, most of incomplete algorithms for DCOPs are context-free, i.e., agents make a decision purely based on the state of their neighbors, which makes them prone to get trapped in poor local convergence. On the other hand, context-based algorithms use tables to exactly store all the information (e.g., costs, confidence bounds), which limits their scalability. This paper tackles the limitation by incorporating deep neural networks in solving DCOPs for the first time and presents a neural context-based sampling scheme built upon regret-matching. In the algorithm, each agent trains a neural network to approximate the regret related to its local problem under current context and performs sampling according to the estimated regret. Furthermore, to ensure exploration, we propose a regret rounding scheme that rounds small regret values to positive numbers. We theoretically show the regret bound of our algorithm and extensive evaluations indicate that our algorithm can scale up to large-scale DCOPs and significantly outperform the state-of-the-art methods.

引用

页码：146 / 153

页数：8

共 31 条

[1] Emergence of scaling in random networks [J].

Barabási, AL ;

Albert, R .

SCIENCE, 1999, 286 (5439) :509-512

[2]

Blackwell David., 1956, Pacific Journal of Mathematics, V6, P1, DOI [10.2140/pjm.1956.6.1, DOI 10.2140/PJM.1956.6.1]

[3] A unifying framework for iterative approximate best-response algorithms for distributed constraint optimization problems [J].

Chapman, Archie C. ;

Rogers, Alex ;

Jennings, Nicholas R. ;

Leslie, David S. .

KNOWLEDGE ENGINEERING REVIEW, 2011, 26 (04) :411-444

[4] Distributed Gibbs: A Linear-Space Sampling-Based DCOP Algorithm [J].

Duc Thien Nguyen ;

Yeoh, William ;

Lau, Hoong Chuin ;

Zivan, Roie .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 64 :705-748

[5]

Farinelli Alessandro, AAMAS, P639

[6]

Fioretto F, 2017, AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, P999

[7] Distributed Constraint Optimization Problems and Applications: A Survey [J].

Fioretto, Ferdinando ;

Pontelli, Enrico ;

Yeoh, William .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 :623-698

[8] An Introduction to Deep Reinforcement Learning [J].

Francois-Lavet, Vincent ;

Henderson, Peter ;

Islam, Riashat ;

Bellemare, Marc G. ;

Pineau, Joelle .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2018, 11 (3-4) :219-354

[9]

Freuder E.C., 1985, P 9 INT JOINT C ART, P1076

[10] A simple adaptive procedure leading to correlated equilibrium [J].

Hart, S ;

Mas-Colell, A .

ECONOMETRICA, 2000, 68 (05) :1127-1150

← 1 2 3 4 →