Deep reinforcement learning for multi-objective game strategy selection

被引：4

作者：

Jiang, Ruhao ^{[1
,3
,5
]}

Deng, Yanchen ^{[2
]}

Chen, Yingying ^{[1
,3
]}

Luo, He ^{[1
,3
,4
]}

An, Bo ^{[2
]}

机构：

[1] Hefei Univ Technol, Sch Management, Hefei 230009, Peoples R China

[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

[3] Minist Educ, Key Lab Proc Optimizat & Intelligent Decis Making, Hefei 230009, Peoples R China

[4] Engn Res Ctr Intelligent Management Aerosp Syst, Hefei 230009, Anhui Province, Peoples R China

[5] Hefei Comprehens Natl Sci Ctr, Hefei 230009, Anhui Province, Peoples R China

来源：

COMPUTERS & OPERATIONS RESEARCH | 2024年 / 168卷

基金：

中国国家自然科学基金;

关键词：

Multi-objective game; Strategy selection; Preference vector; Deep reinforcement learning; NONDOMINATED EQUILIBRIUM SOLUTIONS; ZERO-SUM GAME; MATRIX GAMES;

D O I：

10.1016/j.cor.2024.106683

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Multi -objective game (MOG) is a fundamental model for the decision -making problems in which each player must consider multi -dimensional payoffs that reflect different objectives. Typically, solving MOG involves refining the set of equilibrium strategies, which is also known as MOG strategy selection (MOGS). However, existing MOG algorithms only allow one metric for MOGS, which limits the application in real -world scenarios where the players may have different preferences over multiple metrics. In this paper, we first develop a preference -based MOGS framework to encompass multiple metrics with different preferences in MOGS. Based on the framework, we introduce the concept of comprehensive evaluation value (CEV) to evaluate the quality of a strategy set given the preference of each metric. Using CEV as a reward signal, we formulate the problem of finding the optimal strategy set as a Markov decision process, and use deep reinforcement learning to train a policy for MOG strategy selection given the metrics and the corresponding preferences. Specifically, we combine a rational strategy filtering procedure with a Transformer -based encoder-decoder policy network to refine the strategies given the preferences, and then we use a revised REINFORCE algorithm to train the policy network. Besides, we introduce variable beam search decoding to improve the quality of a rollout by keeping track of the most promising strategy sets and choosing the best one. We benchmark our algorithm on the MOG instances generated by GAMUT, and extensive experiments demonstrate that our algorithm can generate the strategy set significantly better than the state-of-the-art baselines with lower computational overhead given different preferences. Furthermore, we compare our approach on real -world problems, showing the great advantages in both performance and runtime.

引用

页数：16

共 60 条

[1]

Avigad G, 2011, IEEE CONF COMPU INTE, P166, DOI 10.1109/CIG.2011.6032003

[2] Open Energy Market Strategies in Microgrids: A Stackelberg Game Approach Based on a Hybrid Multiobjective Evolutionary Algorithm [J].

Belgana, Ahmed ;

Rimal, Bhaskar P. ;

Maier, Martin .

IEEE TRANSACTIONS ON SMART GRID, 2015, 6 (03) :1243-1252

[3]

Bello Irwan., 2016, 5 INT C LEARN REPR I

[4] Machine learning for combinatorial optimization: A methodological tour d'horizon [J].

Bengio, Yoshua ;

Lodi, Andrea ;

Prouvost, Antoine .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2021, 290 (02) :405-421

[5]

Blackwell D., 1956, Pacific Journal of Mathematics, V6, P1, DOI [10.2140/pjm.1956.6.1, DOI 10.2140/PJM.1956.6.1]

[6] Generalized Nash equilibrium models for asymmetric, non-cooperative games on line graphs: Application to water resource systems [J].

Boyd, Nathan T. ;

Gabriel, Steven A. ;

Rest, George ;

Dumm, Tom .

COMPUTERS & OPERATIONS RESEARCH, 2023, 154

[7] On solving matrix games with pay-offs of triangular fuzzy numbers: Certain observations and generalizations [J].

Chandra, S. ;

Aggarwal, A. .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2015, 246 (02) :575-581

[8]

Chen XY, 2019, ADV NEUR IN, V32

[9] Robust Nash equilibria in vector-valued games with uncertainty [J].

Crespi, Giovanni P. ;

Kuroiwa, Daishi ;

Rocca, Matteo .

ANNALS OF OPERATIONS RESEARCH, 2020, 289 (02) :185-193

[10] Fuzzy based GA to multi-objective entropy bimatrix game [J].

Das C.B. ;

Roy S.K. .

OPSEARCH, 2013, 50 (1) :125-140

← 1 2 3 4 5 6 →